Hacker News new | ask | show | jobs
by yunusabd 54 days ago
Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.
1 comments

Maybe you shouldn't be relying on something if you can't even tell how good it is?