Currently LLMs do not have executive or error detection cognitive abilities. There is no theory of self or emotional instinct and imperatives. At the moment LLMs are just mindless statical models.
There is however a subfield of statistical ML of model uncertainty quantification. I've developed a product by applying to it to LLMs that can score the trustworthiness of any LLM response. Like any ML-based product, my tool is not perfect, but it can detect incorrect LLM responses with pretty high precision/recall across applications spanning RAG / Q&A, data extraction, classification, summarization, ...
> LLMs do not have […] error detection […] abilities
Are you saying the beginning of the article where it describes how the next token is predicted, how it’s possible to know the distribution of possible next tokens, isn’t accurate?
A statistical model which is instructed to output the token that is most likely to come next doesn’t have “confidence” in its choice based on the distribution of possible tokens. We might, but it cannot. A statistical model cannot be confident or unsure. It has no mind.
It also has no concept of what it means for the choice of token to be an “error” or not, or what a “correct” answer would be.
The model does not "output the token that is most likely to come next". The model provides a list of probabilities and the sampler algorithm picks one; those are two different components.
The point is that neither the model nor the sampler algorithm can possibly have “confidence” in its behaviour or the system’s collective behaviour.
If I put a weight on one side of a die, and I roll it, the die is not more confident that it will land on that side than it would be otherwise, because dice do not have the ability to be confident. Asserting otherwise shows a fundamental misunderstanding of what a die is.
I think it's better to say that it's not grounded in anything. (Of course, the sampler is free to verify it with some external verifier, and then it would be.)
But there are algorithms with stopping conditions (Newton-Raphson, gradient descent), and you could say that an answer is "uncertain" if it hasn't run long enough to come up with a good enough answer yet.
If we run the Newton-Raphson algorithm on some input and it hasn’t run long enough to come up with a good enough answer yet, then we are uncertain about the answer. It is not the case that the algorithm is uncertain about the answer. It would make no sense to make any claims about the algorithm’s level of certainty, because an algorithm does not have the capacity to be certain.
"confidence" doesn't have to be an emotional state. It's essentially just another word for "probability" here - any model's confidence of X is the probability it yields for X. Isn't this common terminology?
It may be terminology that some people use in that way, but it’s becoming increasingly common for people describing LLMs to use such terminology to mean that the LLM literally has the capacity for understanding.
Personally, until recently I can only recall people saying things along the lines of “applying the model indicates that we can state this fact about the data with this much confidence”, never “the model has this much confidence” in some truth statement, especially one independent of its training data.
You’re missing my point. Take one of the articles described in that comment, titled “The Internal State of an LLM Knows When It's Lying”. It states “In this paper, we provide evidence that the LLM's internal state can be used to reveal the truthfulness of statements.” Both of these are untrue, for a number of reasons.
- An LLM knowing when it is lying is not the same thing as its internal state being able to “reveal the truthfulness of statements”. The LLM does not know when it is lying, because LLMs do not know things.
- It is incapable of lying, because lying requires possessing intent to lie. Stating untrue things is not the same as lying.
- As the paper states shortly afterwards, what it actually shows is “given a set of test sentences, of which half are true and half false, our trained classifier achieves an average of 71% to 83% accuracy”. That’s not the same thing as it being able to “reveal the truthfulness of statements”.
No intellectually honest person would claim that this finding means an LLM “knows when it is lying”.
I'm not missing your point. I just don't think you're making one.
You keep saying the same nonsense over and over again. A LLM does not know things so... What kind of argument is that ? You're working backwards from a conclusion that is nothing but your own erroneous convictions on what a "statistical model" is and are undertaking a whole lot of mental gymnastics to stay there.
There are a lot of papers there that all try to approach this in different ways. You should read them and try to make an honest argument and that doesn't involve "This doesn't count because - claim that is in no way empirically or theoretically validated."
You are the one claiming that LLMs are conscious, so it falls to you to prove it.
I argued that LLMs do not have the capacity to have ideas or to know things, and you tried to prove me wrong by providing examples of papers that show, for example, that LLMs have internal states that can be used to predict the likelihood that what they will output will be facts. But that doesn’t disprove what I said, because that’s not what it means to have ideas or know things. By definition, only conscious beings can do those things.
It's definitely not accurate to view that sort of prediction error or other internal value with an overall measure of the confidence, accuracy, "truth" or etc of the language the LLM produces.
I find they do have very sophisticated emotional intelligence and theory of self. If you do not, I suppose you must not have very much curiosity to push the boundaries of what is possible with them.