| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Rhapso 620 days ago
	"Hallucination" isn't really a problem that can be "fixed". Its just model error. The root problem is simply that the model doesn't capture reality, just an approximation. What we are incorrectly calling "hallucination" is just the best the model has to offer.

3 comments

spencerchubb 620 days ago

it's not "just" model error

during pre-training, there is never an incentive for the model to say "I don't know" because it would be penalized. the model is incentivized to make an educated guess

large transformer models are really good at approximating their dataset. there is no data on the internet about what LLMs know. and even if there were such data, it would probably become obsolete soon

that being said, maybe a big shift in the architecture could solve this. I hope!

happypumpkin 619 days ago

> it would probably become obsolete soon

Suppose there are many times more posts about something one generation of LLMs can't do (arithmetic, tic-tac-toe, whatever), than posts about how the next generation of models can do that task successfully. I think this is probably the case.

While I doubt it will happen, it would be somewhat funny if training on that text caused a future model to claim it can't do something that it "should" be able to because it internalized that it was an LLM and "LLMs can't do X."

spencerchubb 619 days ago

also presumes that the LLM knows it is an LLM

adwn 619 days ago

System prompts sometimes contain the information that "it" is an LLM.

Maybe in the future, those prompts will include motivational phrases, like "You can do it!" or "Believe in yourself, then you can achieve anything."

Vecr 619 days ago

They're generally fine tuned not to. I'm not sure how long that will hold though.

ykonstant 618 days ago

- Are you an LLM?

- As a Large Language Model, I am fine tuned to be unable to answer this question.

singularity2001 619 days ago

in another paper which popped up recently they approximated uncertainty with Entropy and inserted "wait!" tokens whenever Entropy was high, simulating chain of thought within the system.

spywaregorilla 619 days ago

> during pre-training, there is never an incentive for the model to say "I don't know" because it would be penalized. the model is incentivized to make an educated guess

The guess can be "I don't know". The base LLM would generally only say I don't know if it "knew" that it didn't know, which is not going to be very common. The tuned LLM would be the level responsible for trying to equate a lack of understanding to saying "I don't know"

tucnak 620 days ago

I'm led to believe this is mostly because "known unknowns" are not well-represented in the training datasets... I think, instead of bothering with refusals and enforcing a particular "voice" with excessive RL, they ought to focus more on identifying "gaps" in the datasets and feeding them back, perhaps they're already doing this with synthetic data / distillation.

dilap 620 days ago

it can be fixed in theory if the model knows-what-it-knows, to avoid saying things its uncertain about (this is what (some) humans do to reduce the frequency w which they say untrue things).

theres some promising research using this idea, tho i dont have it at hand.

hoosieree 620 days ago

LLMs can't hallucinate. They generate the next most likely token in a sequence. Whether that sequence matches any kind of objective truth is orthogonal to how models work.

I suppose depending on your point of view, LLMs either can't hallucinate, or that's all they can do.

ToValueFunfetti 619 days ago

>Whether that sequence matches any kind of objective truth is orthogonal to how models work.

Empirically, this cannot be true. If it were, it would be statistically shocking how often models coincidentally say true things. The training does not perfectly align the model with truth, but 'orthogonal' is off by a minimum of 45 degrees.

viraptor 619 days ago

It matches the training data. Whether the training data matches truth (and whether it's correctly understood - sarcasm included) is a completely separate thing.

> The training does not perfectly align the model with truth, but 'orthogonal'

Nitpicky, but the more dimensions you have, the easier it is for almost everything to be orthogonal. (https://softwaredoug.com/blog/2022/12/26/surpries-at-hi-dime...) That's why averaging embeddings works.

ToValueFunfetti 619 days ago

I went to school to learn about the world and the overwhelming majority of that learning was from professors and textbooks. Whether the professors' beliefs and the textbooks' contents reflected the true properties of the world was a completely separate thing, entirely outside of my control. But I did come away with a better understanding of the world and few would say that education is orthogonal to that goal.

If you add two vectors that don't have a truth component (ie. are orthogonal to the truth), the resulting vector should be no closer to the truth. If you start with random weights and perform some operation on them such that the new weights have a higher likelihood of producing true statements, the operation must not have been orthogonal to the truth. Am I wrong there?

viraptor 619 days ago

> But I did come away with a better understanding of the world and few would say that education is orthogonal to that goal.

That's due to the reward function / environment. But even outside extremes like North Korea, lots of education environments value conformity over independent analysis.

timcobb 619 days ago

Isn't this the same thing that happens when you train a human on truths vs falsehoods?

CooCooCaCha 620 days ago

Whenever someone takes issue with using the word “hallucinate” with LLMs I get the impression they’re trying to convince me that hallucination is good.

Why do you care so much about this particular issue? And why can’t hallucination be something we can aim to improve?

AnimalMuppet 620 days ago

I'm pretty sure there's something I don't understand, but:

Doesn't an LLM pick the "most probable next symbol" (or, depending on temperature, one of the most probable next symbols)? To do that, doesn't it have to have some idea of what the probability is? Couldn't it then, if the probability falls below some threshold, say "I don't know" instead of giving what it knows is a low-probability answer?

dTal 620 days ago

It doesn't really work like that.

1) The model outputs a ranked list of all tokens; the probability always sums to 1. Sometimes there is a clear "#1 candidate", very often there are a number of plausible candidates. This is just how language works - there are multiple ways to phrase things, and you can't have the model give up every time there is a choice of synonyms.

2) Probability of a token is not the same as probability of a fact. Consider a language model that knows the approximate population of Paris (2 million) but is not confident about the exact figure. Feed such a model the string "The exact population of Paris is" and it will begin with "2" but halfway through the number it will have a more or less arbitrary choice of 10 digits. "2.1I don't know" is neither a desirable answer, nor a plausible one from the model's perspective.

darkPotato 620 days ago

My understanding is that the hallucination is, out of all the possibilities, the most probable one (ignoring temperature). So the hallucination is the most probable sequence of tokens at that point. The model may be able to predict an "I don't have that information" given the right context. But ensuring that in general is an open question.

viraptor 620 days ago

> Doesn't an LLM pick the "most probable next symbol"

Yes, but that very rarely matters. (Almost never when it's brought up in discussions)

> Couldn't it then, if the probability falls below some threshold, say "I don't know" instead of giving what it knows is a low-probability answer?

A low probability doesn't necessarily mean something's incorrect. Responding to your question in French would also have very low probability, even if it's correct. There's also some nuance around what's classified as a hallucination... Maybe something in the training data did suggest that answer as correct.

There are ideas similar to this one though. It's just a bit more complex than pure probabilities going down. https://arxiv.org/abs/2405.19648

anticensor 615 days ago

> Responding to your question in French would also have very low probability, even if it's correct.

It's actually a common utterance in Paris.

anon291 620 days ago

You need to separate out the LLM, which only produces a set of probabilities, from the system, which includes the LLM and the sampling methodology. Sampling is currently not very intelligent at all.

The next bit of confusion is that the 'probability' isn't 'real'. It's not an actual probability but a weight that sums up to one, which is close enough to how probability works that we call it that. However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them. A fixed threshold is not a good idea in this case. Instead, smarter sampling methods are necessary. One possibility is that if we do have seeming confusion, to put a 'confusion marker' into the text and predict the next output and train models to refine the answer as they go along. Not sure if any work has been done here, but this seems to go along with what you're interested in

viraptor 620 days ago

> However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them.

That's the result after softmax. If you want to act on the raw results, you can still do that.

anon291 618 days ago

The results before softmax don't sum to one so don't even act like a probability distribution. And that's the point. When you have the pre-softmax activations, there are infinitely many ways to convert them to something probability-like. You can normalize them after taking the square root, the square, raising to three, etc. Or you can exponentiate and for some reason that does better. Either way it's not a 'real' probability distribution.

ithkuil 620 days ago

This may work when the next token is a key concept but when it's a filler word or a part of one of many sequences of words that can convey the same meaning but in different ways (synonyms but not only at the word also at the sentence levels) then it's harder to know whether the probability is low because the word is absolutely unlikely or because it's likelihood is spread/shared among other truthful statements

skydhash 620 days ago

You would need some kind of referential facts that you hold as true, then some introspection method to align sentences to those. if it can’t be done, the output may be “I don’t know”. But even for programming languages (simplest useful languages), it would be hard to do.

PaulHoule 620 days ago

My guess is the problem is words with high probabilities that happen to be part of a wrong answer.

For one thing the probability of a word occurring is just a probability of the word occurring in a certain sample, it's not an indicator of truth. (e.g. the most problematic concept in philosophy in that just introducing it undermines the truth, see "9/11 truther") It's also not sufficient to pick a "true" word or always pick a "true" word but rather the truthfulness of a statement needs to be evaluated based on the statement as a whole.

A word might have a low probability because it competes with a large number of alternatives that are equally likely which is not a reason to stop generation.

visarga 619 days ago

This reminds me it's easy to train similarity models, hard to train identity/equivalence prediction. Two strings can be similar in many ways, like "Address Line 1" and "Address Line 2" or "Position_X" and "Position_Y", yet distinct in meaning. That one character makes all the difference. On the other hand "Vendor Name" is equivalent with "Seller Company" even though they are pretty different lexically.

The dot product, which is at the core of attention, is good for similarity not identity. I think this is why models hallucinate - how can they tell the distinction between "I have trained on this fact" and "Looks like something I trained on".

atrus 620 days ago

I don't think that fixes it, even in theory, since there's always some uncertainty.