Hacker News new | ask | show | jobs
by lispisok 372 days ago
>That being said, I'm starting to doubt the leaderboards as an accurate representation of model ability

Goodhart's law applies here just like everywhere else. Much more so given how much money these companies are dumping into making these models.