Hacker News new | ask | show | jobs
by esafak 296 days ago
That is not the same thing! You are talking about the point distribution of the next token. We are talking about the uncertainty associated with each of those candidate tokens; a distribution of distributions.

It's the difference between a categorical distribution and a Dirichlet. https://en.wikipedia.org/wiki/Dirichlet_distribution

1 comments

I think we're talking about the same thing. I should be clear that I don't think the selected token probabilities being reported are enough, but if you're reporting each returned tokens probability (both selected and discarded) and aggregating the cumulative probabilities of the given context, it should be possible to see when you're trending centrally towards uncertainty.
No, it isn't the same thing. The softmax probabilities are estimates; they're part of the prediction. The other poster is talking about the uncertainty in these estimates, so the uncertainty in the softmax probabilities.

The softmax probabilities are usually not a very good indication of uncertainty, as the model is often overconfident due to neural collapse. The uncertainty in the softmax probabilities is a good indication though, and can be used to detect out-of-distribution entries or poor predictions.