|
|
|
|
|
by kostaj
16 days ago
|
|
This is in line with my observations and tests as well. Also supported by the distribution of the verdicts across the 4-buckets -- Gemini uses the middle buckets (Mostly True and Misleading) much less often - 6% combined for Gemini w/o search. And Opus uses them the most - 45% combined. Looks like Gemini is calibrated to be confident and Opus to be careful. |
|