| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Lord_Zero 95 days ago
	Why no mention of GPT-5.5?

1 comments

gertlabs 95 days ago

Waiting on public API release. Once it drops, results will be up within 24 hours.

link

gertlabs 95 days ago

Results are up. GPT 5.5 is a beast.

link

wahnfrieden 95 days ago

Have you considered running models like GPT 5.5 inside their agent harness (Codex)?

link

gertlabs 95 days ago

I see the value in that, but there are a few reasons that isn't on the immediate roadmap -- mainly, it shifts focus from measuring the model to measuring the harness. The agentic benchmark section you see on the site is comparable to how an agent would perform using an open harness like Pi. But latest tool-using models are pretty well adapted to any harness, so I think that's less of a factor in overall model performance.

link

wahnfrieden 95 days ago

Just fresh on my mind after reading this from Codex team member re: performance difference between Pi and Codex app server usage: https://x.com/pashmerepat/status/2046865863979172039

link

ZeroGravitas 95 days ago

Well that couldn't be vaguer if he tried. Basically saying, our stuff is better, no reasons given.

link