|
|
|
|
|
by scrollop
178 days ago
|
|
Alright so we have more benchmarks including hallucinations and flash doesn't do well with that, though generally it beats gemini 3 pro and GPT 5.1 thinking and gpt 5.2 thinking xhigh (but then, sonnet, grok, opus, gemini and 5.1 beat 5.2 xhigh) - everything. Crazy. https://artificialanalysis.ai/evaluations/omniscience |
|