Hacker News new | ask | show | jobs
by HeavyStorm 87 days ago
There's no "just" in RL. Fine tuning is very important and could make a lot of difference.
2 comments

Indeed, this is quite obvious on Claude models vs Gemini. I fully believe Gemini is more powerful model, but the post training process is nowhere near what Anthropic does, which results in Gemini being horrible at coding sessions, while Claude is excellent.
apparently GPT-5 uses the same pretrain as 4o did, hah