Hacker News new | ask | show | jobs
by AnimalMuppet 613 days ago
I'm pretty sure there's something I don't understand, but:

Doesn't an LLM pick the "most probable next symbol" (or, depending on temperature, one of the most probable next symbols)? To do that, doesn't it have to have some idea of what the probability is? Couldn't it then, if the probability falls below some threshold, say "I don't know" instead of giving what it knows is a low-probability answer?

7 comments

It doesn't really work like that.

1) The model outputs a ranked list of all tokens; the probability always sums to 1. Sometimes there is a clear "#1 candidate", very often there are a number of plausible candidates. This is just how language works - there are multiple ways to phrase things, and you can't have the model give up every time there is a choice of synonyms.

2) Probability of a token is not the same as probability of a fact. Consider a language model that knows the approximate population of Paris (2 million) but is not confident about the exact figure. Feed such a model the string "The exact population of Paris is" and it will begin with "2" but halfway through the number it will have a more or less arbitrary choice of 10 digits. "2.1I don't know" is neither a desirable answer, nor a plausible one from the model's perspective.

My understanding is that the hallucination is, out of all the possibilities, the most probable one (ignoring temperature). So the hallucination is the most probable sequence of tokens at that point. The model may be able to predict an "I don't have that information" given the right context. But ensuring that in general is an open question.
> Doesn't an LLM pick the "most probable next symbol"

Yes, but that very rarely matters. (Almost never when it's brought up in discussions)

> Couldn't it then, if the probability falls below some threshold, say "I don't know" instead of giving what it knows is a low-probability answer?

A low probability doesn't necessarily mean something's incorrect. Responding to your question in French would also have very low probability, even if it's correct. There's also some nuance around what's classified as a hallucination... Maybe something in the training data did suggest that answer as correct.

There are ideas similar to this one though. It's just a bit more complex than pure probabilities going down. https://arxiv.org/abs/2405.19648

> Responding to your question in French would also have very low probability, even if it's correct.

It's actually a common utterance in Paris.

You need to separate out the LLM, which only produces a set of probabilities, from the system, which includes the LLM and the sampling methodology. Sampling is currently not very intelligent at all.

The next bit of confusion is that the 'probability' isn't 'real'. It's not an actual probability but a weight that sums up to one, which is close enough to how probability works that we call it that. However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them. A fixed threshold is not a good idea in this case. Instead, smarter sampling methods are necessary. One possibility is that if we do have seeming confusion, to put a 'confusion marker' into the text and predict the next output and train models to refine the answer as they go along. Not sure if any work has been done here, but this seems to go along with what you're interested in

> However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them.

That's the result after softmax. If you want to act on the raw results, you can still do that.

The results before softmax don't sum to one so don't even act like a probability distribution. And that's the point. When you have the pre-softmax activations, there are infinitely many ways to convert them to something probability-like. You can normalize them after taking the square root, the square, raising to three, etc. Or you can exponentiate and for some reason that does better. Either way it's not a 'real' probability distribution.
This may work when the next token is a key concept but when it's a filler word or a part of one of many sequences of words that can convey the same meaning but in different ways (synonyms but not only at the word also at the sentence levels) then it's harder to know whether the probability is low because the word is absolutely unlikely or because it's likelihood is spread/shared among other truthful statements
You would need some kind of referential facts that you hold as true, then some introspection method to align sentences to those. if it can’t be done, the output may be “I don’t know”. But even for programming languages (simplest useful languages), it would be hard to do.
My guess is the problem is words with high probabilities that happen to be part of a wrong answer.

For one thing the probability of a word occurring is just a probability of the word occurring in a certain sample, it's not an indicator of truth. (e.g. the most problematic concept in philosophy in that just introducing it undermines the truth, see "9/11 truther") It's also not sufficient to pick a "true" word or always pick a "true" word but rather the truthfulness of a statement needs to be evaluated based on the statement as a whole.

A word might have a low probability because it competes with a large number of alternatives that are equally likely which is not a reason to stop generation.