| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simonw 505 days ago
	The $5.5m in compute wasn't for R1, it was for DeepSeek v3. The R1 trick looks like it may be a whole lot cheaper than that. R1 apparently used just 800,000 samples - I don't fully understand the processing needed on top of those samples but I get the impression it's a whole lot less compute than the $5.5m used to train v3.