The model does not "output the token that is most likely to come next". The model provides a list of probabilities and the sampler algorithm picks one; those are two different components.
The point is that neither the model nor the sampler algorithm can possibly have “confidence” in its behaviour or the system’s collective behaviour.
If I put a weight on one side of a die, and I roll it, the die is not more confident that it will land on that side than it would be otherwise, because dice do not have the ability to be confident. Asserting otherwise shows a fundamental misunderstanding of what a die is.
I think it's better to say that it's not grounded in anything. (Of course, the sampler is free to verify it with some external verifier, and then it would be.)
But there are algorithms with stopping conditions (Newton-Raphson, gradient descent), and you could say that an answer is "uncertain" if it hasn't run long enough to come up with a good enough answer yet.
If we run the Newton-Raphson algorithm on some input and it hasn’t run long enough to come up with a good enough answer yet, then we are uncertain about the answer. It is not the case that the algorithm is uncertain about the answer. It would make no sense to make any claims about the algorithm’s level of certainty, because an algorithm does not have the capacity to be certain.
I'm not the one doing the arithmetic here, I've outsourced it to the computer. So I don't have any calculated uncertainty because I'm not paying enough attention to know how much progress it's made.
If I put a weight on one side of a die, and I roll it, the die is not more confident that it will land on that side than it would be otherwise, because dice do not have the ability to be confident. Asserting otherwise shows a fundamental misunderstanding of what a die is.
The same is true for LLMs.