Hacker News new | ask | show | jobs
by apetresc 792 days ago
That's a valid theory, a priori, but if you actually follow up you'll find that the vast majority of these benchmark results don't end up matching anyone's subjective experience with the models. The churn at the top is not nearly as fast as the press releases make it out to be.
1 comments

Subjective experience is not a benchmark that you can measure success against. Also, of course new models are better on some set of benchmarks. Why would someone bother releasing a "new" model that is inferior to old ones? (Aside from attributes like more preferable licensing).

This is completely normal, the opposite would be strange.