Hacker News new | ask | show | jobs
by logicchains 504 days ago
It's in the DeepSeek V3 paper, not the R1 paper. https://arxiv.org/html/2412.19437v1#abstract

"assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M."

Note that's for V3, the base model; we don't know how much extra R1 cost to train.

1 comments

I see. Thank for the source.

So all the claims of DeepSeek R1's cost [0] is indeed bullshit parroted around...

[0]: https://www.google.com/search?q=deepseek+r1+training+cost

Not really; R1 is post-training on top of V3, which is considerably cheaper than training V3 itself. You can see this in the existence of multiple reproductions of the RL training technique by much smaller labs: https://hkust-nlp.notion.site/simplerl-reason