Hacker News new | ask | show | jobs
by boznz 505 days ago
Genuine question, but how do you replicate the effort exactly without $5M in compute? and can you test that the published weights etc are actually those in the model?

Am I missing something?

1 comments

The $5.5m in compute wasn't for R1, it was for DeepSeek v3.

The R1 trick looks like it may be a whole lot cheaper than that. R1 apparently used just 800,000 samples - I don't fully understand the processing needed on top of those samples but I get the impression it's a whole lot less compute than the $5.5m used to train v3.