Hacker News new | ask | show | jobs
by int_19h 558 days ago
Benchmarks are way too easy to game. There's no shortage of models that "beat GPT-4" according to some benchmark or another, that are obviously nowhere even close when you try them on novel tasks.