Y
Hacker News
new
|
ask
|
show
|
jobs
by
sudosysgen
502 days ago
The R1-Zero paper shows how many training steps the RL took, and it's not many. The cost of the RL is likely a small fraction of the cost of the foundational model.