| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by anon373839 23 days ago

I don’t think that’s exactly indicative of GPT-5.5 being an astoundingly more intelligent model, however. An alternate interpretation is that GPT-5.5 was trained on tool usage/harness patterns and has been optimized for this use case.

I remember that even when GPT-4 was king, the Gorilla paper showed that Llama 7B could be fine-tuned to outperform GPT-4 on tool calling.

On domains that don’t involve agentic tool calling*, I haven’t found the frontier to have advanced that much.

Edit: I should broaden this to domains that naturally lend themselves to RLVR training. Models are drastically better at math now.

1 comments

baq 23 days ago

None of this matters in the product: it either is capable of agentic loop workflows or it isn’t. A 10% improvement in probability of single task success makes or breaks the use case.

link