|
|
|
|
|
by simonw
505 days ago
|
|
The $5.5m in compute wasn't for R1, it was for DeepSeek v3. The R1 trick looks like it may be a whole lot cheaper than that. R1 apparently used just 800,000 samples - I don't fully understand the processing needed on top of those samples but I get the impression it's a whole lot less compute than the $5.5m used to train v3. |
|