Hacker News new | ask | show | jobs
by sameoldtune 660 days ago
Slight sidebar based on the content of the article. I don’t like the term “hallucination” for when a LLM produces nonsense. As if it otherwise has some grasp of reality and when it is wrong it is because it is hallucinating. Everything it produces is a “hallucination“, some of those are just more useful than others.
5 comments

It was a marketing term that worked both ways. In my company they don’t trust AI because, well, it doesn’t actually work it’s just really good at being lucky. Which is fine for a lot of things, and absolutely horrible for things which aren’t fault tolerant. After LLMs got things wrong on a few contracts the top issues a company wide AI ban (outside of IT anyway, but that’s mainly because they hardly know what IT does).

I’m not sure we really suffer from it. In our internal analytics AI tends to slow employees down and make them less productive. This is in general. The exception is experts using it, where LLMs increase their value output by an ok margin. Especially within our own programming team LLMs have proven a real challenge. On one hand they are fancy auto-complete which will speed an experienced developer up by so much it’s hard to ignore. On the other hand it takes an experienced developer to know when they get things wrong. Not the things that literally won’t work, but the things that will work poorly. I haven’t been too involved outside of our team, but I imagine it’s the same in any field.

Which is where the “hallucination” term sort of back-fired. It was a good way to make people buy into the value of LLMs by making the mistakes oddities almost negligible. The issue is that those mistakes can have such massive impacts that the entire trust in the AI industry falters. I mean, we had one of the CEOs ask if we could switch to Linux now that Windows includes AI… obviously we can’t do that without going bankrupt, but it tells you something about the worry in the non-tech enterprise top.

Same here. Many terms in LLM world are not quite right such as prompt “engineering”. It is almost like they were coined by non-tech folks.
What's wrong with the use of the word?
Engineering is something you calculate and get expected results with some acuracy. E.g building a bridge without trials and errors (hopefully). I would prefer prompt “tailoring” which is more like you start with something general and try to fit it to your desired output with many many trials and errors.
It's more like "social engineering" and other uses of the word which don't directly deal with engines, which is what "engineer" used to mean. For example, Google lists one of the definitions as "the action of working artfully to bring something about".
I think it's fair to say that these models may have some grasp of reality insofar as the data we collect ballparks reality, and also insofar as the mechanism to learn from the data effectively extracts the truth value of the data.

We might say the same thing about people.

Ultimately, just how problematic is it to label something as a hallucination? Are investors about to be massively duped? If I create a mechanism to reduce hallucinations and I call it therapy, is that really problematic?

> I think it's fair to say that these models may have some grasp of reality insofar as the data we collect ballparks reality, and also insofar as the mechanism to learn from the data effectively extracts the truth value of the data.

No, it would be fair to say they have a "grasp" of predicting the next word in a given sequence of words based on a set of words in their training set. Hallucination then is what people call their inherent tendency to run into a situation where the probability of the next word being predicted "wrong" is high. And once one "wrong" word has been predicted the probability that the next word is also "wrong" rises exponentially.

LLMs do not have any grasp of reality. They just predict text based on trained patterns. Too many people have been fooled into believing that LLMs can understand anything about reality, but a word-based description of reality is not the same as reality.

> We might say the same thing about people.

If you want to reduce a human being down to being a word predictor, then I guess you could say that?

You don't understand anything about reality. For all you know, you're living in Plato's cave. What you kind of know, is just text you read from a physics book. The things your eyes see, could just as easily be encoded as tokens and fed into an LLM.
That's not true. There's been several papers probing this with different methodologies and the conclusion is pretty clear.

LLMs know a whole lot more about the uncertainty of their predictions than they say.

GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r

Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets - https://arxiv.org/abs/2310.06824

The Internal State of an LLM Knows When It's Lying - https://arxiv.org/abs/2304.13734

LLMs Know More Than What They Say - https://arjunbansal.substack.com/p/llms-know-more-than-what-...

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975

Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334

This is good stuff, thanks for sharing
Technically the same problem applies to humans.