| HN Mirror

Hallucination rates go down by a few % with each new model generation, although some milestones have seen regression e.g. reasoning models are just overall worse. It is a bit hard to measure and compare across generations because the tests have to change in order to stay ahead of training data but they are generally improving. Check out HaluHard, it’s a stress test hallucination benchmark.

https://halluhard.com/

There will never be a time where there are zero hallucinations because of the non-determinist nature of LLMs, but eventually the frequency of hallucinations will be so low that it doesn’t matter. If the robot makes one mistake for every ten a human makes, that’s coming out ahead (depending on the nature of the mistake, of course).

Also I’m not making any value judgement about the technology and how it’s used and, frankly, I don’t really appreciate the assumption that I am. I’m fucking job hunting right now and it’s a hellscape thanks to LLMs.

I’m just being real about where the tech appears to be going based on its current trajectory and my experience in the industry. There seems to be a lot of cope going around that these things won’t ever be good enough to take our jobs. They will, and sooner than any of us are ready for. Leadership is fine with slop as long as it ships, the tech as it stands today doesn’t have to be much better to reach that standard.