Hacker News new | ask | show | jobs
by pclmulqdq 1071 days ago
You don't have to release official numbers to run benchmarks. You also don't have to own the LLM to run benchmarks. Within hours of GPT-4's emergence, many benchmarks had been run.
1 comments

You said their main product was "LLMs that benchmark the best" like benchmarking was some important aspect of marketing. It's not. That's fact. You can't say it's this hugely important thing and conveniently leave out they make near zero effort to do anything with it.

Basically the only people running benchmarks that could have been gamed on GPT-4 were other researchers, not companies, customers or users looking to use a product.

Normal users are certainly not running benchmarks and companies running benchmarks are running ones on internal data, which just defeats the whole point of gaming these research benchmarks.