| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cornel_io 458 days ago

There are various results that suggest that LLMs do internally have everything they'd need to know that they're hallucinating/wrong:

https://arxiv.org/abs/2402.09733

https://arxiv.org/abs/2305.18248

https://www.ox.ac.uk/news/2024-06-20-major-research-hallucin...

So I don't think it's that they have no concept of correctness, they do, but it's not strong enough. We're probably just not training them in ways that optimize for that over other desirable qualities, at least aggressively enough.

It's also clear to anyone who has used many different models over the years that the amount of hallucination goes down as the models get better, even without any special attention being (apparently) paid to that problem. GPT 3.5 was REALLY bad about this stuff, but 4o and o1 are at least mediocre. So it may be that it's just one of the tougher things for a model to figure out, even if it's possible with massive capacity and compute. But I'd say it's very clear that we're not in the world Gary Marcus wishes we were in, where there's some hard and fundamental limitation that keeps a transformer network from having the capability to be more truthful as a it gets better; rather, like all aspects, we just aren't as far along as we'd prefer.

2 comments

ForTheKidz 458 days ago

> There are various results that suggest that LLMs do internally have everything they'd need to know that they're hallucinating/wrong

We need better definitions of what sort of reasonable expectation people can have for detecting incoherency and self-contradiction when humans are horrible at seeing this, except in comparison to things that don't seem to produce meaningful language in the general case. We all have contradictory worldviews and are therefore capable of rationally finding ourselves with conclusions that are trivially and empirically incoherent. I think "hallucinations" (horribly, horribly named term) are just an intractable burden of applying finite, lossy filters to a virtually continuous and infinitely detailed reality—language itself is sort of an ad-hoc, buggy consensus algorithm that's been sufficient to reproduce.

But yea if you're looking for a coherent and satisfying answer on idk politics, values, basically anything that hinges on floating signifiers, you're going to have a bad time.

(Or perhaps you're just hallucinating understanding and agreement: there are many phrases in the english language that read differently based on expected context and tone. It wouldn't surprise me if some models tended towards production of ambiguous or tautological semantics pleasingly-hedged or "responsibly"-moderated, aka PR.)

Personally, I don't think it's a problem. If you are willing to believe what a chatbot says without verifying it there's little advice I could give you that can help. It's also good training to remind yourself that confidence is a poor signal for correctness.

AdieuToLogic 456 days ago

> There are various results that suggest that LLMs do internally have everything they'd need to know that they're hallucinating/wrong:

The underlying requirement, which invalidates an LLM having "everything they'd need to know that they're hallucinating/wrong", is the premise all three assume - external detection.

From the first arxiv abstract:

  Moreover, informed by the empirical observations, we show 
  great potential of using the guidance derived from LLM's 
  hidden representation space to mitigate hallucination.

From the second arxiv abstract:

  Using this basic insight, we illustrate that one can 
  identify hallucinated references without ever consulting 
  any external resources, by asking a set of direct or 
  indirect queries to the language model about the 
  references. These queries can be considered as "consistency 
  checks."

From the Nature abstract:

  Researchers need a general method for detecting 
  hallucinations in LLMs that works even with new and unseen 
  questions to which humans might not know the answer. Here 
  we develop new methods grounded in statistics, proposing 
  entropy-based uncertainty estimators for LLMs to detect a 
  subset of hallucinations—confabulations—which are arbitrary 
  and incorrect generations.

Ultimately, no matter what content is generated, it is up to a person to provide the understanding component.

> So I don't think it's that they have no concept of correctness, they do, but it's not strong enough.

Again, "correctness" is a determination solely made by a person evaluating a result in the context of what the person accepts, not intrinsic to an algorithm itself. All an algorithm can do is attempt to produce results congruent with whatever constraints it is configured to satisfy.