| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alwa 848 days ago

I may be misunderstanding the way LLM practitioners use the word “hallucination,” but I understood it to describe it as something different from the kind of “random” nonsense-word failures that happen, for example, when the temperature is too high [0].

Rather, I thought hallucination, in your example, might be something closer to a grizzled old salesman-map-draftsman’s folk wisdom that sounds like a plausibly optimal mapping strategy to a boss oblivious to the mathematical irreducibility of the problem. Imagining a “fact” that sounds plausible and is rhetorically useful, but that’s never been true and nobody ever said was true.

It’ll still be, like your human in the example, better than average (if “average” means averaged across the universe of all possible answers), and maybe even useful enough to convince the people reading the output, but it will be nonetheless false.

[0] e.g. https://news.ycombinator.com/item?id=39450669

2 comments

fauigerzigerk 848 days ago

If a driver is tasked with visiting a number of places, they will probably choose a reasonably good route. If the driver claims to have found the optimal route, it may not be true, but it's still not a hallucination and it's still a pretty good route.

The driver certainly cannot be relied on to always find an exact solution to an NP-complete problem. But failure modes matter. For practical purposes, the driver's solution is not simply "false". It's just suboptimal.

If we could get LLMs to fail in a similarly benign way, that would make them far more robust without disproving what the posted paper claims.

link

pixl97 848 days ago

> but it will be nonetheless false.

Only if you're assuming all questions have binary answers.

For example in the traveling salesman problem you don't have to compute all answers to start converging on an average. A random sampling of solutions can start setting a bounds for average, and your grizzled salesmans guesses would fall somewhere on that plot. If they are statistically better than average then they are far more than good enough. Unless of course you think burning up the observable universe in finding the best solution is the only way to solve the problem of which trip uses the least gas?

link