| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tshadley 604 days ago

> ...proving that this one particular piece of the hallucination problem may be conceptually simple.

Everything mentioned in the article boils down to that one particular piece-- non-detected uncertainty. The architecture constraints referenced are all situations that cause uncertainty. Training data gaps of course increase uncertainty.

Their solutions are a shotgun blast of heuristics that all focus on reducing uncertainty-- CoT, RAG, fine-tuning, fact-checking -- while somehow avoiding actually measuring uncertainty and using that to eliminate hallucinations!

1 comments

sfink 603 days ago

You're just renaming "error" to "uncertainty". That is incorrect.

Everything unwanted is error, by definition. All of the heuristics are about reducing error, because that's what the goal is. Some of that error is measurable. Some of it is not. You cannot "actually measure" error in any way other than asking people whether the output is what they want -- and that only works because that's how we're defining error. (It also turns out to not be that great of a definition, since people disagree on a lot of cases.)

You can come up with some metric that you label "uncertainty", and that metric may very well be measurable. But it's only going to be correlated with error, not equal to it.

One random example to illustrate the distinction: training gaps can easily decrease uncertainty. You have lots of mammals in your training data, and none of them lay eggs. You ask "The duck-billed platypus is my favorite mammal! Does it lay eggs?" Your model will be very confident when it responds "No". That is a high-confidence error.

tshadley 603 days ago

> One random example to illustrate the distinction: training gaps can easily decrease uncertainty. You have lots of mammals in your training data, and none of them lay eggs. You ask "The duck-billed platypus is my favorite mammal! Does it lay eggs?" Your model will be very confident when it responds "No". That is a high-confidence error.

This article did not seem to make the mistake of associating hallucination with bad data so hard to see exactly how this is relevant. I mean, you could write an article "AI Error: how to reduce it" and frame it entirely in user's perceptions but I wouldn't make a peep.

My objection is that it is silly to use the word "hallucination" (which suggests insanity/psychosis) and then address it as if LLMs are marginally insane and the solution is straight-jacket-like heuristics, when "uncertainty" (which suggests uncertainty) is a far more accurate description of behavior pointing to a far more productive and focused solution.