| HN Mirror

Posting multiple links and asserting that somewhere within one of them the reader will find confirmation of an apparently-absurd statement amounts to an attempted DoS attack on the reader's attention. It's not a sign of good faith. Obviously a model that hallucinates 26% of the time on typical tasks would be of no interest to anyone outside a research environment, so regardless of where the real story is found, it's safe to say it's in there somewhere. It's just not my job to look for it.

On some classes of queries, weak models will hallucinate closer to 100% of the time. One of my favorite informal benchmarks is to throw a metaphorical dart at a map and ask what's special about the smallest town nearby. That's a good trick if you want to observe genuine progress being made in the latest models.

On other tasks, typically the ones that matter, hallucination rates are approaching zero. Not quickly enough for my preference, but the direction is clear enough.