Hacker News new | ask | show | jobs
by srush 234 days ago
There is a footnote that should help with the models. Training is a harder thing to report on, but roughly our finding here is that RL scales.