| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vark90 644 days ago

Yep, usually it's called abstention or rejection.

When people in this field compare various methods of quantifying model uncertainty, they often perform what is called rejection verification. Basically, you continuously reject data points where uncertainty is high, and see how average quality of the remaining outputs increases. A good uncertainty estimate is highly correlated with output quality, and thus low-uncertainty outputs should have higher average quality.

We use exactly this approach in our recent benchmark of uncertainty estimation approaches for LLMS [1] and have an open-source library under development [2] which allows for such benchmarking. It also can produce uncertainty scores for a given model output, so ppl in industry can integrate it into their applications as well.

[1] https://arxiv.org/abs/2406.15627

[2] https://github.com/IINemo/lm-polygraph