Hacker News new | ask | show | jobs
by michaelt 590 days ago
> one way is to ask for a "final answer" so the final response token logprobs can be evaluated

Alas, this won't work.

Imagine I ask an LLM to continue the sentence "Summing those up: 4+6.75+6.52=17.27 litres of pure alcohol. In summary, the total amount of pure alcohol they have is: "

The logprobs of the next token do not represent the LLM's confidence in its own answer. They represent the LLM's confidence in its ability to repeat the total from 18 words previously.