Hacker News new | ask | show | jobs
by Leynos 252 days ago
Apologies for the second reply, but it also occurs to me that reinforcement learning is the new battleground. Look at the changes between o1, o3 and GPT-5 thinking. Sonnet 3.7, Sonnet 4, and Sonnet 4.5. And so forth.

I expect models will get larger again once everyone is doing their inference on B200s, but the RL training budget is where the insatiable appetite sits right now.