| In general most of the previous AI "breakthrough" in the last decade were backed by proper scientific research and ideas: - AlphaGo/AlphaZero (MCTS) - OpenAI Five (PPO) - GPT 1/2/3 (Transformers) - Dall-e 1/2, Stable Diffusion (CLIP, Diffusion) - ChatGPT (RLHF) - SORA (Diffusion Transformers) "Agents" is a marketing term and isn't backed by anything. There is little data available, so it's hard to have generally capable agents in the sense that LLMs are generally capable |
The technology for reasoning models is the ability to do RL on verifiable tasks, with the some (as-of-yet unpublished, but well-known) search over reasoning chains, with a (presumably neural) reasoning fragment proposal machine, and a (presumably neural) scoring machine for those reasoning fragments.
The technology for agents is effectively the same, with some currently-in-R&D way to scale the training architecture for longer-horizon tasks. ChatGPT agent or o3/o4-mini are likely the first published models that take advantage of this research.
It's fairly obvious that this is the direction that all the AI labs are going if you go to SF house parties or listen to AI insiders like Dwarkesh Patel.