|
|
|
|
|
by fnord123
125 days ago
|
|
> the AA-Omniscience Hallucination Rate Benchmark which puts 3.0 Pro among the higher hallucinating models. 3.1 seems to be a noticeable improvement though. As sibling comment says, AA-Omniscience Hallucination Rate Benchmark puts Gemini 3.0 as the best performing aside from Gemini 3.1 preview. https://artificialanalysis.ai/evaluations/omniscience |
|
https://artificialanalysis.ai/#aa-omniscience-hallucination-...
If you look at the results 3.0 hallucinates an awful lot, when it's wrong.
It's just not wrong that often.
(And it looks like 3.1 does better on both fronts)