Hacker News new | ask | show | jobs
by sudosysgen 502 days ago
The R1-Zero paper shows how many training steps the RL took, and it's not many. The cost of the RL is likely a small fraction of the cost of the foundational model.