Hacker News new | ask | show | jobs
by machiaweliczny 501 days ago
This is very bearish for current AI. Seems like 99% reliability is still too small with compounding errors. But I wonder of this is inherently specific to longer context or if this just depends on how it’s trained. In theory longer context => more errors

Although I think people are the same, too big problem and you are getting lost unless taking it in bites, so seems like OpenAI implementation is just bad because o3 hallucination benchmark shouldn’t lead to such poor performance