| HN Mirror

If 2 (or more) tokens are synonymous with each other with high probabilities (49.9% each for a total of 99.8%), that's still low entropy. Not as low as a singular high-probability token, but low enough for us to consider this a low-entropy token distribution.

You can't look at a single token distribution, though. There are many legitimate high-confidence, high-accuracy cases in which many tokens could come next. For example, the first token of a paragraph. You need to look at pools of entropies over segments of the output or the whole output sequence.

Although there's a correlation between uncertainty and hallucinations or inaccuracies, there's no guarantee. This is a challenging area that we're monitoring the latest literature for and contributing where we can.