|
|
|
|
|
by jampekka
31 days ago
|
|
Models can answer "I don't know". Hallucination benchmarks, including this, give the models the option to "not attempt". It's just that the metric linked doesn't take into account the rate of correct answers at all. It has its uses in analyzing incorrect vs not attempted answers, but gives a very partial picture. |
|