Hacker News new | ask | show | jobs
by mmiyer 556 days ago
I guess it's because it has the highest score of all models in instruction following, 20 points higher then Opus, which compensates for shortcomings elsewhere (e.g. in language), and which wouldn't necessarily translate to human evaluation of usefulness.
1 comments

Wow, yeah I think you're right - 3.3 somehow gets top position on the entire leaderboard for that category, I bet that skews the average up a lot.