| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by swores 420 days ago

That's not correct, and seems to be based on a common misunderstanding of how LLMs work, the rough idea being that when the info the model is being asked for had been in the data used for training, it "looks it up" not unlike software looking up info from a huge database of general knowledge, and that when that lookup fails it falls back to making stuff up. But that's wrong, the models are actually doing the exact same thing when they're hallucinating as when they're correct, just the result is different.

Hallucinations happen when the model determines that the most likely suitable string of tokens turns out to contain incorrect information, regardless of whether the correct information is "missing" or whether the correct information actually would have been outputted had it, when selecting the first token of the response, instead selected the option that it considers second best rather than best.

Whether or not a piece of information was in the training set can obviously influence the likelihood of a model hallucinating when asked about the subject, but it can easily hallucinate about stuff that was in the training and it can also get things right that weren't in the training data.

1 comments

koakuma-chan 420 days ago

If an LLM happens to know the answer to your question, that answer will have the greatest weight, and will therefore become a non-hallucinated output. Otherwise the output will be hallucinated. Note that a hallucination may manifest as an attempt to extrapolate, which may be successful. If you query an LLM with prior knowledge that the LLM doesn’t know the answer, you are guaranteed to receive a hallucinated output.

Or at least this is how I interpret the term.

link

swores 420 days ago

But that's not how they actually work.

> "If an LLM happens to know the answer to your question, that answer will have the greatest weight"

An LLM doesn’t “know” anything in the way you’re imagining. It doesn’t have stored facts or indexed knowledge to check against, it just has weights learned between token sequences, and it outputs whatever next token is assigned the highest probability given the prompt and prior context. That might happen to produce a correct answer (and people are obviously working hard to make the models produce right answers as often as possible), but it might just as easily produce a plausible-sounding but wrong one, even if the correct information was in the training data. Because that correct information being there doesn't guarantee it will have the highest weighting ever, yet alone the highest weighting in all contexts of previous tokens and in all temperature settings.

You’re right that hallucinations can sometimes look like “extrapolations” that happen to land correctly, but that’s incidental. It’s still doing the same token-by-token probability selection regardless of whether it ends up right or wrong.

Framing it around “missing knowledge” vs “existing knowledge” is misleading intuition. It’s better to think about it in terms of probability distributions over token sequences: the model’s training biases it toward correct sequences more often than incorrect ones, but there’s nothing fundamental in the architecture that guarantees that if the answer was present in training, it will always beat out wrong guesses.

p.s. It's late at night here and I'm about to go to bed, so apologies if I've not explained well in this comment - I gave it to ChatGPT hoping it could tidy things up for me and it just made a way more confusing version so I'm posting it as is :D Let me know if my explanation still isn't clear and I could try again, or answer any questions you have, tomorrow

link

koakuma-chan 420 days ago

> An LLM doesn’t “know” anything in the way you’re imagining. It doesn’t have stored facts or indexed knowledge to check against

Neither does your brain and yet you do "know" something.

> but it might just as easily produce a plausible-sounding but wrong one, even if the correct information was in the training data

If the majority of information that was in the LLM's training data said 1 + 1 = 3, the LLM will tell you that 1 + 1 = 3, even if there was some information that said 1 + 1 = 2, and there's nothing wrong with that because the LLM is not supposed to fact-check.

> the model’s training biases it toward correct sequences more often than incorrect ones

No, the model's training biases it toward sequences that appear more frequently.

link

BoorishBears 420 days ago

It's trivial to prove this is wrong: invert relationships it knows about and it fails to answer based on knowledge it previously demonstrated (even with loads of hints)

https://chatgpt.com/share/680dc86c-f0dc-800d-9f04-57ba2f126a...

https://chatgpt.com/share/680dc90b-de28-800d-92b6-f2ef824777...

Note how applying increasing pressure to answer was what caused the hallucination: hallucinations aren't tied to if the model "knows" something.

Once the tokens output don't fall into the start of some varation of "I don't know", the model is going to answer regardless of what it knows.

link

koakuma-chan 419 days ago

This is a logical problem that requires reasoning. Bring the fact that Barry White’s mother is Sadie Marie Carter into the context and it should then be able to infer who Sadie Marie Carter’s son is.

link

BoorishBears 419 days ago

You're saying it requires reasoning, then suggesting a non-reasoning based solution (steering the model in-context)

Reasoning doesn't solve it. O3 thinks and comes up empty, and O4-mini thinks and then hallucinates much worse than even 4o did: https://chatgpt.com/share/680e7a5c-ae3c-8004-bff9-5c8a0215a1...

link

koakuma-chan 419 days ago

The model only knows who the mother of Barry White is. It doesn’t know who the son of Sadie Marie Carter is. By reasoning I meant that it could figure out who the son of Sadie Marie Carter is if you brought into the context that Sadie Marie Carter is the mother of Barry White beforehand.

link

heylook 420 days ago

> If an LLM happens to know the answer to your question

You're missing the point. It doesn't "know" anything. The only thing it can "know" is the statistical relationships between tokens in its dataset. It doesn't "know" anything about the meaning of those tokens. It doesn't even "know" whether it "knows" anything or not. The best it can do is "Here's a recursively generated string of ASCII codes that are statistically likely to follow each other according to the data corpus."

It's Rashomon. It can point you in the right directions a lot of the time, but there's no getting around the fact that you have to double-check its answers with external sources.

> Or at least this is how I interpret the term.

That's not a very useful interpretation because it's not grounded in technical reality.

link

koakuma-chan 420 days ago

> It doesn't "know" anything.

The word know is an abstraction I use in order to avoid going into technical details.

> That's not a very useful interpretation because it's not grounded in technical reality.

My interpretation aligns with what people generally mean by hallucination, and it's definitely more useful than saying that any output is hallucination.

link

swores 420 days ago

The difference is: what people generally mean by hallucination is "LLM said something wrong as if it was right". And what you are adding to that in your previous comments is the concept of whether or not the LLM knows the right answer. Which it never does. That's where your interpretation and the general interpretation differ.

I'm afraid I don't personally see how to explain more clearly, so will just say instead that given multiple people are in this thread telling you your understanding of how LLMs work isn't right, please consider that to at least be a possibility and look into it further rather than digging deeper into your current beliefs.

link