| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jug 373 days ago

I doubt it will have. OpenAI planned to release GPT-5 in 2024 or early 2025, it underwhelmed, and anonymous OpenAI sources have claimed that the later GPT-4.5 was actually GPT-5 relabelled to set expectations. It was seen as roughly a 20% improvement over GPT-4o. This is when it sunk in for OpenAI that they were at the end of the road for non-reasoning models. Scaling issues made them too costly.

Turning to their reasoning models, it’s also known and documented through SimpleQA and PersonQA that OpenAI o3 hallucinates more than o1, and o4-mini even more than o3. There’s an unmanaged issue where training on synthetic data improves benchmark results on STEM tasks but increases hallucination rates, especially troubling OpenAI models for some reason (my guess: they’re fine-tuned to take risks since it’s known to also increase likelihood of getting it right for hard tasks?)

Google has long known OpenAI struggles with hallucinations more than them according to an anonymous Googler that I saw commented on this. This has been verified by the aforementioned benchmarks. Anthropic also struggles less. But as far as I can tell, they’re all facing issues with synthetic data acting like a double edged sword.

So GPT-5 is going to be interesting. How well it exactly does will bear a lot of meaning for the kind of trouble OpenAI is in right now. Maybe OpenAI has found a novel approach in reducing hallucinations? I think that’s among their most crucial points right now. But other than this, no, I don’t expect a revolution, only an evolution. They might currently win benchmarks, but it will hardly be something that catapults them.

If GPT-5 underwhelms, it will bear a stronger signal than merely the one that GPT-5 underwhelms. Because then OpenAI has trouble with both non-reasoning and reasoning models, and we’re likely to be looking at the end of the road on the horizon for current GPT based LLM’s and one where the winner will probably ultimately be cheaper open weight models once they catch up.