Hacker News new | ask | show | jobs
by simonw 27 days ago
The leap from GPT-4 to GPT-5.5 has been astounding in my opinion. There is no way GPT-4 could run a coding agent harness like Codex at even a fraction of the quality that GPT-5.5 does.
2 comments

I don’t think that’s exactly indicative of GPT-5.5 being an astoundingly more intelligent model, however. An alternate interpretation is that GPT-5.5 was trained on tool usage/harness patterns and has been optimized for this use case.

I remember that even when GPT-4 was king, the Gorilla paper showed that Llama 7B could be fine-tuned to outperform GPT-4 on tool calling.

On domains that don’t involve agentic tool calling*, I haven’t found the frontier to have advanced that much.

Edit: I should broaden this to domains that naturally lend themselves to RLVR training. Models are drastically better at math now.

None of this matters in the product: it either is capable of agentic loop workflows or it isn’t. A 10% improvement in probability of single task success makes or breaks the use case.
For me any of the codex models run circles around the non codex models for codex usage.

I'm not sure why you're so obsessed with the non-codex versions