| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by astrange 605 days ago
	The model does not "output the token that is most likely to come next". The model provides a list of probabilities and the sampler algorithm picks one; those are two different components.

1 comments

reshlo 605 days ago

The point is that neither the model nor the sampler algorithm can possibly have “confidence” in its behaviour or the system’s collective behaviour.

If I put a weight on one side of a die, and I roll it, the die is not more confident that it will land on that side than it would be otherwise, because dice do not have the ability to be confident. Asserting otherwise shows a fundamental misunderstanding of what a die is.

The same is true for LLMs.

link

astrange 605 days ago

I think it's better to say that it's not grounded in anything. (Of course, the sampler is free to verify it with some external verifier, and then it would be.)

But there are algorithms with stopping conditions (Newton-Raphson, gradient descent), and you could say that an answer is "uncertain" if it hasn't run long enough to come up with a good enough answer yet.

link

reshlo 605 days ago

If we run the Newton-Raphson algorithm on some input and it hasn’t run long enough to come up with a good enough answer yet, then we are uncertain about the answer. It is not the case that the algorithm is uncertain about the answer. It would make no sense to make any claims about the algorithm’s level of certainty, because an algorithm does not have the capacity to be certain.

link

astrange 604 days ago

I'm not the one doing the arithmetic here, I've outsourced it to the computer. So I don't have any calculated uncertainty because I'm not paying enough attention to know how much progress it's made.

link

reshlo 604 days ago

The important part is that the algorithm doesn’t either.

link