|
|
|
|
|
by BoorishBears
19 days ago
|
|
Every model release you'll post this, and every time I'll be there to point out how it's completely useless (for reasons you've shared are intentional) It does things like place the old Gemini 3 Flash above the more capable 3.5 Flash and Opus 4.5 - Opus 4.8 and gpt-5.5 At least, until hopefully one day HN has a rule about accounts that derive 99.9999% of their engagement with the site from shilling a personal project. |
|
I found it while trying to use 3.5 Flash for scoring the reasoning of some models, and it gets it wrong because of the centering bias, whereas 3 Flash gets scoring right.