|
|
|
|
|
by cornel_io
458 days ago
|
|
There are various results that suggest that LLMs do internally have everything they'd need to know that they're hallucinating/wrong: https://arxiv.org/abs/2402.09733 https://arxiv.org/abs/2305.18248 https://www.ox.ac.uk/news/2024-06-20-major-research-hallucin... So I don't think it's that they have no concept of correctness, they do, but it's not strong enough. We're probably just not training them in ways that optimize for that over other desirable qualities, at least aggressively enough. It's also clear to anyone who has used many different models over the years that the amount of hallucination goes down as the models get better, even without any special attention being (apparently) paid to that problem. GPT 3.5 was REALLY bad about this stuff, but 4o and o1 are at least mediocre. So it may be that it's just one of the tougher things for a model to figure out, even if it's possible with massive capacity and compute. But I'd say it's very clear that we're not in the world Gary Marcus wishes we were in, where there's some hard and fundamental limitation that keeps a transformer network from having the capability to be more truthful as a it gets better; rather, like all aspects, we just aren't as far along as we'd prefer. |
|
We need better definitions of what sort of reasonable expectation people can have for detecting incoherency and self-contradiction when humans are horrible at seeing this, except in comparison to things that don't seem to produce meaningful language in the general case. We all have contradictory worldviews and are therefore capable of rationally finding ourselves with conclusions that are trivially and empirically incoherent. I think "hallucinations" (horribly, horribly named term) are just an intractable burden of applying finite, lossy filters to a virtually continuous and infinitely detailed reality—language itself is sort of an ad-hoc, buggy consensus algorithm that's been sufficient to reproduce.
But yea if you're looking for a coherent and satisfying answer on idk politics, values, basically anything that hinges on floating signifiers, you're going to have a bad time.
(Or perhaps you're just hallucinating understanding and agreement: there are many phrases in the english language that read differently based on expected context and tone. It wouldn't surprise me if some models tended towards production of ambiguous or tautological semantics pleasingly-hedged or "responsibly"-moderated, aka PR.)
Personally, I don't think it's a problem. If you are willing to believe what a chatbot says without verifying it there's little advice I could give you that can help. It's also good training to remind yourself that confidence is a poor signal for correctness.