Hacker News new | ask | show | jobs
by MattDaEskimo 481 days ago
There's a serious issue with benchmarks.

Instead of resolving it, some leaders are further complicating their meaning

Such as OpenAI grading their benchmarks based on "how much money they made" or "how easy a model was convinced to hand over fake money".