Zhangzhe's Blog
The projection of my life.
Home
Tags
Categories
Search
0%
Great! 138 posts in total. Keep on posting.
2025
01-07
大模型DPO入门
01-06
大模型RLHF入门
01-04
大模型RAG入门
2024
12-31
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
12-27
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
12-27
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
12-26
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
12-26
LOMO:Full Parameter Fine-tuning for Large Language Models with Limited Resources
12-20
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
12-20
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
1
2
…
14