Hacker News new | ask | show | jobs
by xienze 558 days ago
I’ve had multiple occasions where I’ve asked an LLM how to do <whatever> in Java and it’ll very confidently answer to use <some class in some package that doesn’t exist>. It would be far more helpful to me to receive an answer like “I don’t think there’s a third party library that does this, you’ll have to write it yourself” than to waste my time telling me a lie. If anything, calling these outputs “hallucinations” is a very polite way of saying that the LLM is bullshitting the user.
3 comments

Of course the LLM is bullshitting the user. That's precisely its purpose: LLMs are tools that generate comprehensible sounding language based on probability models that describe what words/tokens tend to be found in proximity to each other. An LLM doesn't actually know anything by reference to verifiable, external facts.

Sure, LLMs can be used as fancy search engines that index documents and then answer questions by referring to them, but even there, the probabilistic nature of the underlying model can still result in mistakes.

Models do know things. Facts are encoded in their parameters. Look at the some of the interpretability research to see that. They aren't just Markov chains.
Nope. They don't know any specific facts. The training data produces a probability matrix that reflects what words are likely to be found in relation other words, allowing it to generate novel combinations of words that are coherent and understandable. But there is no mechanism involved for determining whether those novel expressions are actually factual representations of reality.
Again, read the papers. They absolutely do know facts, and that can be seen in the activations. Your description is oversimplified. It's easy to get models to emit statistically improbable but correct sequences of words. They are not just looking at what words are near by each other, that doesn't lead to the kind of output LLMs are capable of.
Exactly. People forget that we did make systems that were just Markov chains long before LLMs, like the famous Usenet Poster "Mark V. Shaney" (created by Rob Pike of Plan 9 and Golang fame) that was trained on Usenet posts in the 1980s. You didn't need deep learning or any sort of neural nets for that. It could come up with sentences that sometimes made some sort of sense, but that was it. The oversimplified way LLMs are sometimes explained makes it sound like they are no different from Mark V. Shaney, but they obviously are.

https://en.wikipedia.org/wiki/Mark_V._Shaney

Yeah I get that, but at the same time we have AI hype men talking out of both sides of their mouth:

> This model is revolutionary, it knows everything, can answer anything with perfect accuracy!

“It’s fed me bullshit numerous times”

> OF COURSE it’s bullshitting you, don’t you know how LLMs work?

Like how am I supposed to take any of this tech seriously when the LLM is always answering questions as if it had the utmost confidence in what it is spitting out?

Hilariously, that really does basically define “bullshitting”.
Bullshit in the Frankfurtian sense.

There is a recent paper that explains it: https://link.springer.com/article/10.1007/s10676-024-09775-5

The LLM is always bullshitting the user. It's just sometimes the things it talks about happen to be real and sometimes they don't.
LLMs don't know things, they just string together responses that are a best fit for what follows from their prompt.

I suspect its so hard to get them to say "I don't know" because if they were biased towards responding that way then I would assume thats almost all they would ever say, since "I don't know" is an appropriate answer to every question imaginable.

I get that, but since it is all probabilities, you might imagine even the LLM knows when it is skating on thin ice.

If I'm beginning with "Once / upon / a" I think the data will show a very high confidence in the word to follow with. So too I would imagine it would know when the trail of breadcrumbs it has been following is of the trashier and low probability kind.

So just tell me. (Or perhaps speak to me and when your confidence is low you can drift into vocal fry territory.)

Maybe just having a confidence weight assigned to each sentence the LLM generates, reflected in tooltips or text coloring, would be a big improvement.
> the LLM knows

I don't think you get it.

He does get it and models do know their own confidence levels with a remarkably high degree of accuracy. The article states this clearly:

> Encoded truth: Recent work suggests that LLMs encode more truthfulness than previously understood, with certain tokens concentrating this information, which improves error detection. However, this encoding is complex and dataset-specific, hence limiting generalization. Notably, models may be encoding the correct answers internally despite generating errors, highlighting areas for targeted mitigation strategies.

Linking to this paper: https://arxiv.org/pdf/2410.02707

"Recent studies have demonstrated that LLMs’ internal states encode information regarding the truthfulness of their outputs, and that this information can be utilized to detect errors. In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized."

This was already known years ago, by the way. The meme that LLMs just generate statistically plausible text is wrong and has been from the start. That's not how they work.

>The meme that LLMs just generate statistically plausible text is wrong and has been from the start

Did you read that paper? It doesn't support discarding this "meme" at all. More importantly, I don't think it adequately supports that LLMs "know facts"

FFS, the actual paper is about training models on the LLM state to predict whether it's actual output is correct. The interesting finding to them is that their models predict about a 75% chance of being correct even before the LLM starts generating text, that the conversation part of the answer has a low predicted chance of being correct, and that the "exact answer", a term they've created, is usually where the chance the LLM is correct (according to their trained model) peaks.

What they have demonstrated is that you can build a model that looks at in memory LLM state and have a 75% chance of guessing whether the LLM will produce the correct answer based on how the model reacts to the prompt. Even taking as a given (which you shouldn't in a science paper) that there's no trickery going on in the Probe models, accidental or otherwise, this is perfectly congruent with the statement that LLMs only "generate statistically probable text in the context of their training corpus and the prompt"

Notably, why don't they demonstrate that you can predict whether a trained but completely unprompted model will "know" the answer? Why does the LLM have to process the conversation before you can >90% chance predict whether it will produce the answer? If the LLM stores facts in it's weights, you should be able to demonstrate that completely at rest.

IMO, what they've actually done is produce "Probe models" that can 75% of the time correctly predict whether an LLM will produce a certain token or set of tokens in it's generation. That is coherent with an LLM model being, broadly speaking, a model of how tokens relate to each other from a point of view of language. The main quibble in these discussions is that doesn't constitute "knowing" IMO. LLMs are a model of language, not reality. That's why they are good at producing accurate language, and bad at producing accurate reality. That most facts are expressed in language doesn't mean language IS facts.

A question: Why don't LLMs produce garbage grammar when they "hallucinate"?

> why don't they demonstrate that you can predict whether a trained but completely unprompted model will "know" the answer?

The answer to what? You have to ask a question to test whether the answer will be accurate, and that's the prompt. I don't understand this objection.

> If the LLM stores facts in it's weights, you should be able to demonstrate that completely at rest.

Sure, with good enough interpretability systems, and those are being worked on. Anthropic can already locate which parts of the model fire on specific topics or themes and force them on or off by manipulating the activation vectors.

> A question: Why don't LLMs produce garbage grammar when they "hallucinate"?

Early models did.

They mean the non-normalized probabilities for each tokens is available. Many API give access to the top-n. You can color the text based on it, or include it in your pipelines, like trigger looking externally, or inject claims of uncertainty (the same things I do). It's not remotely guaranteed, but it's some low hanging fruit that can sometimes be useful.

One of these days, someone will figure out how to include that in the training/inference loop. It's probably important for communication and reasoning, considering a similar concept happens in my head (some sort of sparsity detection).