| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lafreb 617 days ago
	The paper says that they've improved hallucination mitigation, but not really "solved" the issue.

1 comments

Rhapso 617 days ago

"Hallucination" isn't really a problem that can be "fixed". Its just model error.

The root problem is simply that the model doesn't capture reality, just an approximation. What we are incorrectly calling "hallucination" is just the best the model has to offer.

link

spencerchubb 617 days ago

it's not "just" model error

during pre-training, there is never an incentive for the model to say "I don't know" because it would be penalized. the model is incentivized to make an educated guess

large transformer models are really good at approximating their dataset. there is no data on the internet about what LLMs know. and even if there were such data, it would probably become obsolete soon

that being said, maybe a big shift in the architecture could solve this. I hope!

link

happypumpkin 617 days ago

> it would probably become obsolete soon

Suppose there are many times more posts about something one generation of LLMs can't do (arithmetic, tic-tac-toe, whatever), than posts about how the next generation of models can do that task successfully. I think this is probably the case.

While I doubt it will happen, it would be somewhat funny if training on that text caused a future model to claim it can't do something that it "should" be able to because it internalized that it was an LLM and "LLMs can't do X."

link

spencerchubb 617 days ago

also presumes that the LLM knows it is an LLM

link

adwn 617 days ago

System prompts sometimes contain the information that "it" is an LLM.

Maybe in the future, those prompts will include motivational phrases, like "You can do it!" or "Believe in yourself, then you can achieve anything."

link

Vecr 617 days ago

They're generally fine tuned not to. I'm not sure how long that will hold though.

link

ykonstant 615 days ago

- Are you an LLM?

- As a Large Language Model, I am fine tuned to be unable to answer this question.

link

singularity2001 617 days ago

in another paper which popped up recently they approximated uncertainty with Entropy and inserted "wait!" tokens whenever Entropy was high, simulating chain of thought within the system.

link

spywaregorilla 617 days ago

> during pre-training, there is never an incentive for the model to say "I don't know" because it would be penalized. the model is incentivized to make an educated guess

The guess can be "I don't know". The base LLM would generally only say I don't know if it "knew" that it didn't know, which is not going to be very common. The tuned LLM would be the level responsible for trying to equate a lack of understanding to saying "I don't know"

link

tucnak 617 days ago

I'm led to believe this is mostly because "known unknowns" are not well-represented in the training datasets... I think, instead of bothering with refusals and enforcing a particular "voice" with excessive RL, they ought to focus more on identifying "gaps" in the datasets and feeding them back, perhaps they're already doing this with synthetic data / distillation.

link

dilap 617 days ago

it can be fixed in theory if the model knows-what-it-knows, to avoid saying things its uncertain about (this is what (some) humans do to reduce the frequency w which they say untrue things).

theres some promising research using this idea, tho i dont have it at hand.

link

hoosieree 617 days ago

LLMs can't hallucinate. They generate the next most likely token in a sequence. Whether that sequence matches any kind of objective truth is orthogonal to how models work.

I suppose depending on your point of view, LLMs either can't hallucinate, or that's all they can do.

link

ToValueFunfetti 617 days ago

>Whether that sequence matches any kind of objective truth is orthogonal to how models work.

Empirically, this cannot be true. If it were, it would be statistically shocking how often models coincidentally say true things. The training does not perfectly align the model with truth, but 'orthogonal' is off by a minimum of 45 degrees.

link

viraptor 617 days ago

It matches the training data. Whether the training data matches truth (and whether it's correctly understood - sarcasm included) is a completely separate thing.

> The training does not perfectly align the model with truth, but 'orthogonal'

Nitpicky, but the more dimensions you have, the easier it is for almost everything to be orthogonal. (https://softwaredoug.com/blog/2022/12/26/surpries-at-hi-dime...) That's why averaging embeddings works.

link

ToValueFunfetti 617 days ago

I went to school to learn about the world and the overwhelming majority of that learning was from professors and textbooks. Whether the professors' beliefs and the textbooks' contents reflected the true properties of the world was a completely separate thing, entirely outside of my control. But I did come away with a better understanding of the world and few would say that education is orthogonal to that goal.

If you add two vectors that don't have a truth component (ie. are orthogonal to the truth), the resulting vector should be no closer to the truth. If you start with random weights and perform some operation on them such that the new weights have a higher likelihood of producing true statements, the operation must not have been orthogonal to the truth. Am I wrong there?

link

timcobb 617 days ago

Isn't this the same thing that happens when you train a human on truths vs falsehoods?

link

CooCooCaCha 617 days ago

Whenever someone takes issue with using the word “hallucinate” with LLMs I get the impression they’re trying to convince me that hallucination is good.

Why do you care so much about this particular issue? And why can’t hallucination be something we can aim to improve?

link

AnimalMuppet 617 days ago

I'm pretty sure there's something I don't understand, but:

Doesn't an LLM pick the "most probable next symbol" (or, depending on temperature, one of the most probable next symbols)? To do that, doesn't it have to have some idea of what the probability is? Couldn't it then, if the probability falls below some threshold, say "I don't know" instead of giving what it knows is a low-probability answer?

link

dTal 617 days ago

It doesn't really work like that.

1) The model outputs a ranked list of all tokens; the probability always sums to 1. Sometimes there is a clear "#1 candidate", very often there are a number of plausible candidates. This is just how language works - there are multiple ways to phrase things, and you can't have the model give up every time there is a choice of synonyms.

2) Probability of a token is not the same as probability of a fact. Consider a language model that knows the approximate population of Paris (2 million) but is not confident about the exact figure. Feed such a model the string "The exact population of Paris is" and it will begin with "2" but halfway through the number it will have a more or less arbitrary choice of 10 digits. "2.1I don't know" is neither a desirable answer, nor a plausible one from the model's perspective.

link

darkPotato 617 days ago

My understanding is that the hallucination is, out of all the possibilities, the most probable one (ignoring temperature). So the hallucination is the most probable sequence of tokens at that point. The model may be able to predict an "I don't have that information" given the right context. But ensuring that in general is an open question.

link

viraptor 617 days ago

> Doesn't an LLM pick the "most probable next symbol"

Yes, but that very rarely matters. (Almost never when it's brought up in discussions)

> Couldn't it then, if the probability falls below some threshold, say "I don't know" instead of giving what it knows is a low-probability answer?

A low probability doesn't necessarily mean something's incorrect. Responding to your question in French would also have very low probability, even if it's correct. There's also some nuance around what's classified as a hallucination... Maybe something in the training data did suggest that answer as correct.

There are ideas similar to this one though. It's just a bit more complex than pure probabilities going down. https://arxiv.org/abs/2405.19648

link

anticensor 613 days ago

> Responding to your question in French would also have very low probability, even if it's correct.

It's actually a common utterance in Paris.

link

anon291 617 days ago

You need to separate out the LLM, which only produces a set of probabilities, from the system, which includes the LLM and the sampling methodology. Sampling is currently not very intelligent at all.

The next bit of confusion is that the 'probability' isn't 'real'. It's not an actual probability but a weight that sums up to one, which is close enough to how probability works that we call it that. However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them. A fixed threshold is not a good idea in this case. Instead, smarter sampling methods are necessary. One possibility is that if we do have seeming confusion, to put a 'confusion marker' into the text and predict the next output and train models to refine the answer as they go along. Not sure if any work has been done here, but this seems to go along with what you're interested in

link

viraptor 617 days ago

> However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them.

That's the result after softmax. If you want to act on the raw results, you can still do that.

link

anon291 616 days ago

The results before softmax don't sum to one so don't even act like a probability distribution. And that's the point. When you have the pre-softmax activations, there are infinitely many ways to convert them to something probability-like. You can normalize them after taking the square root, the square, raising to three, etc. Or you can exponentiate and for some reason that does better. Either way it's not a 'real' probability distribution.

link

ithkuil 617 days ago

This may work when the next token is a key concept but when it's a filler word or a part of one of many sequences of words that can convey the same meaning but in different ways (synonyms but not only at the word also at the sentence levels) then it's harder to know whether the probability is low because the word is absolutely unlikely or because it's likelihood is spread/shared among other truthful statements

link

skydhash 617 days ago

You would need some kind of referential facts that you hold as true, then some introspection method to align sentences to those. if it can’t be done, the output may be “I don’t know”. But even for programming languages (simplest useful languages), it would be hard to do.

link

PaulHoule 617 days ago

My guess is the problem is words with high probabilities that happen to be part of a wrong answer.

For one thing the probability of a word occurring is just a probability of the word occurring in a certain sample, it's not an indicator of truth. (e.g. the most problematic concept in philosophy in that just introducing it undermines the truth, see "9/11 truther") It's also not sufficient to pick a "true" word or always pick a "true" word but rather the truthfulness of a statement needs to be evaluated based on the statement as a whole.

A word might have a low probability because it competes with a large number of alternatives that are equally likely which is not a reason to stop generation.

link

visarga 617 days ago

This reminds me it's easy to train similarity models, hard to train identity/equivalence prediction. Two strings can be similar in many ways, like "Address Line 1" and "Address Line 2" or "Position_X" and "Position_Y", yet distinct in meaning. That one character makes all the difference. On the other hand "Vendor Name" is equivalent with "Seller Company" even though they are pretty different lexically.

The dot product, which is at the core of attention, is good for similarity not identity. I think this is why models hallucinate - how can they tell the distinction between "I have trained on this fact" and "Looks like something I trained on".

link

atrus 617 days ago

I don't think that fixes it, even in theory, since there's always some uncertainty.

link