| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jampekka 31 days ago
	Models can answer "I don't know". Hallucination benchmarks, including this, give the models the option to "not attempt". It's just that the metric linked doesn't take into account the rate of correct answers at all. It has its uses in analyzing incorrect vs not attempted answers, but gives a very partial picture.