Hacker News new | ask | show | jobs
by numeri 115 days ago
and if I was to guess, the latest generation of models (Claude Opus 4.6, GPT-5.3-codex, etc.) differ from Opus 4.5, GPT 5.2 primarily in the addition of deeper, more difficult (most likely agentic and coding-based, like Terminal Bench) tasks to their RLVR training.

I could be completely off, as my intuition here is fully based on public research papers, but it seems to explain the current state of things fairly well.