Zhangzhe's Blog
The projection of my life.
Home
Tags
Categories
Search
0%
MoE
Tag
2024
12-31
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
12-27
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
12-27
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model