| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sudosysgen 502 days ago
	The R1-Zero paper shows how many training steps the RL took, and it's not many. The cost of the RL is likely a small fraction of the cost of the foundational model.