Hacker News new | ask | show | jobs
by m_eiman 847 days ago
Can we stop call AIs giving incorrect information ”hallucinations”, please? It’s just a clever PR stunt to sweep the glaring issues under a carpet.
4 comments

Intractable model error that's elemental to the approach won't get you any funding though.

Anthropomorphizing statistical learning is how you build a hype machine to cash out people with zero handle on the subject. See the comment below about "AI judges" and "true justice". Just like early electricity, all people see is magic.

No, it's good that the public understands that AIs are wrong so regularly that we need a special word dedicated to this one specific manner in which they're wrong.

Generative AI output is becoming inextricably associated with this word, and that's not a bad thing to keep people aware of.

There should be a special word for the rare occasion when the LLM generates truth.
"miracle"? ;)
> No, it's good that the public understands that AIs are wrong so regularly

_compared to what_, exactly. Compared to a google search? Compared to asking a random person? Compared to wikpedia? New York Times journalists?

Any of those things are wrong _very_ frequently. It's such an uninteresting thing to call out every time an AI is wrong, when it is right about things so frequently that people don't bother to notice how amazing it is that it gets anything correct about the world at all.

The whole point is that it's a qualitatively different kind of failure, it's not quantitative. Wikipedia doesn't hallucinate. It can be wrong, sure, but it doesn't do what LLMs do when they go off the rails. So there should be a word that applies to LLM output but not a person or article that's simply wrong.
Why?
Why should there be different words for different concepts? I don't understand the question. We already use different terms for "lying" and "mistaken". We've invented a new way to be wrong and calling it something different conveys more nuance than just calling it "wrong".
I guess I don't understand the difference between a LLM "hallucinating" by probabilistically having chosen the wrong output given a certain input pattern, vs a human doing the same thing and just being "wrong". (But to be fair, this could just be my own lack of understanding about how LLMs and human brains work!)

I've certainly made that class of error myself, when I assumed that something followed a similar pattern (like in math, or writing & grammar, or coding) when it actually didn't.

I've also doubled-down on those errors when I tried to double-check my work, believing myself to have misapplied some intermediate step rather than having taken an entirely wrong approach to begin with.

I think the "why" here is "why are we assuming this failure mode is unique to LLMs and deserves novel terminology".

The goal of anyone contributing something useful shouldn't be to immortalise one's name. You, by defending this practice, give yourself away by having similar ambitions.
what
you get what you sow. don't put stupid comments out there.
Any recommendations? The public seems to actually understand what this means although it’s just more anthropomorphization of a random bullshit generator.
How about you call them what they are:

Bugs, Defects and "not fit for production".

How about we stop with all the Nonsense around calling it "temperature" like it's a sick baby and call it RAND cause that's what it is.

The PT Barnum levels of bullshit around ML (see we have a term that isnt using artificial or intelligence) has gotten old. Sam Altman is the next Elizabeth Holmes.

</rant>

I came here to suggest the same thing. This "hallucination" soft euphemism seems to be the tech press's way to continue to write positively about defective AI software while lightheartedly joking about how it sometimes does an oopsie.

If I ask a software to write about a well known fact or historical event and it just makes stuff up, it's not simply hallucinating. It's defective.

The thing is, it isn't a defect. People misunderstand that there is no practical difference between a "hallucinated" result and a real one, as far as an LLM is concerned. It doesn't reason or calculate beyond matching tokens, it has no deeper contextual understanding of truth or correctness beyond statistical likelihood. Hallucinations are the result of the LLM doing exactly what it's designed to do, exactly the way its designed to do it.

The defect isn't in the software, but in people expecting these things to operate the way AIs in sci-fi do, or who believe that because they can produce coherent results in natural language, they must be sentient and self-aware.

It's a defect from the point of view of user expectations. When Intel's floating point bug was in the news, I remember a small number of people claiming it was not a defect because the chip was just doing what it was designed to do: Yea, it was designed in such a way that it could produce incorrect results. In other words a bug!

I'm sure AI companies will get very good at explaining away these defects with various forms of "aCkShUaLlY" but when your marketing materials say you made a box that takes a prompt and answers it, and it answers incorrectly, what else is it than a defect?

The problem, in that case, exists between the keyboard and chair.

Floating point math is inherently inaccurate, and no programmer using it would expect perfect precision and call it a defect not to get it. You have to understand how floating point works and take that inaccuracy into account. As a result there are some applications for which using floats is simply a bad idea. No one sane is doing real money calculations with floats.

The same goes for LLMs. Hallucination is fundamental to the model. We're going to have to realize that there are many tasks for which AI simply isn't well suited. And we're going to have to get over this persistent delusion that humans are categorically worse than AI at everything. A paralegal doing research would probably not simply fabricate cases and cites whole cloth. That's not how most humans work. Humans are capable of knowing when they don't know something, AI is not.

But we've decided, for whatever reason, that AI is perfectly trustworthy. That's going to keep biting us in the ass until we learn.

Fit for purpose... most of the time, except when it isnt then Oops... Lets color in the failure with a human term "hallucination" cause "we can't really fix it".

Sugar coating the fact that it is defective (defined: imperfect or faulty.) isnt changing things.

Your explanation is correct, it's defective by design.

LLMs are hallucinating machines. They never not hallucinate. Coincidentally, sometimes they hallucinate something true.
This is exactly why we shouldn't call it a hallucination when the AI outputs false statements.

Saying it hallucinated is just a tautology.

I forget where I originally heard this idea, but I always explain to people that LLMs are (affectionately) "bullshitters." Terms like "lying" or "hallucinating" imply that it's trying to tell the truth, but actually it doesn't care if what it says is true or not at all save for the fact that true text is slightly more plausible than false text.
Instead of ‘hallucinations’, try ‘samplings from the model that happen not to be sufficiently reminiscent of reality’. Of course, it’s a little bit less catchy. But that’s the problem with catchiness — it sticks regardless of its truth.

The fact that ‘correct’ outputs are treated as if they’re the product of an in-any-way-different process to the ‘hallucinated’ ones is the problem.

> The fact that ‘correct’ outputs are treated as if they’re the product of an in-any-way different process to the ‘hallucinated’ ones is the problem.

Also this particular context just makes it easier to notice, compared a 5000 word generated coherent-word-salad that equally wrong, but across the 5000 words.

call it what it is: random bullshit.
I guess that's fewer syllables than "hallucination."

I'm not sure how I feel about the term "hallucination" as it's applied to AI. Since you seem strongly opposed to it, let me ask you this long-winded half-question:

People understand computer things by creating analogies to the physical world - just look at the "Desktop" motif. "Folders" and "Files" too, for that matter. It seems to me that anthropomorphization would fit under that umbrella, though you may disagree. How do you feel about computer anthropomorphization in general? Is there something about "hallucination" that's particularly offensive?

Well, both the true and false outputs are equally random bullshit from a machine. "Hallucination" is just the word that caught on to describe random bullshit outputs that are false.
Even "bullshit" implies something like a mind, with intent to deceive. It should be more like noise, aberrations, incorrectly extrapolated filler material.
Bullshit is not deception. There is no intent to convince. Bullshit is even less than that. Bullshit is just blowing hot hair to create a buffer between reality and it's consequences.
I always thought of hallucinations/dreams as "random bullshit in my brain"
Even actual computer "hallucinations" aren't so great.

"Wargames" (1983): https://www.youtube.com/watch?v=71k7-dGhNFQ&t=4m8s