|
|
|
|
|
by ahzhou
500 days ago
|
|
I might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL. Interesting to note - we have no idea how much R1 cost to train.
To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison. |
|
FP8 training and GRPO make sense to me, but that only gets you a 4x improvement total, right?