| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kvetching 93 days ago

https://artificialanalysis.ai/evaluations/omniscience?omnisc...

AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused or admitted to not knowing the answer. It is defined as the proportion of incorrect answers out of all non-correct responses, i.e. incorrect / (incorrect + partial answers + not attempted).

Grok 4.2 which was just released in the API just benched the best at this benchmark.

2 comments

SideQuark 93 days ago

Of all the valuable metrics on that site, all of which grok does badly at except one, you managed to pick that single one.

https://artificialanalysis.ai/models

link

Braxton1980 91 days ago

This isn't a response to my question. I asked why you trust him

link