|
|
|
|
|
by tylerneylon
596 days ago
|
|
I couldn't figure out if this project is based on an academic paper or not — I mean some published technique to determine LLM uncertainty. This recent work is highly relevant: https://learnandburn.ai/p/how-to-tell-if-an-llm-is-just-gues... It uses an idea called semantic entropy which is more sophisticated than the standard entropy of the token logits, and is more appropriate as a statistical quantification of when an LLM is guessing or has high certainty. The original paper is in Nature, by authors from Oxford. |
|
But even with this in mind, there are caveats. We have recently published [2] a comprehensive benchmark of SOTA approaches to estimating uncertainty of LLMs, and have reported that while in many cases these semantic-aware methods do perform very well, in other tasks simple baselines, like average entropy of token distributions, performs on par or better than complex techniques.
We have also developed an open-source python library [3] (which is still in early development) that offers implementations of all modern UE techniques applicable to LLMs, and allows easy benchmarking of uncertainty estimation methods as well as estimating output uncertainty for deployed models in production.
[1] https://arxiv.org/abs/2307.01379
[2] https://arxiv.org/abs/2406.15627
[3] https://github.com/IINemo/lm-polygraph