| HN Mirror

You said their main product was "LLMs that benchmark the best" like benchmarking was some important aspect of marketing. It's not. That's fact. You can't say it's this hugely important thing and conveniently leave out they make near zero effort to do anything with it.

Basically the only people running benchmarks that could have been gamed on GPT-4 were other researchers, not companies, customers or users looking to use a product.

Normal users are certainly not running benchmarks and companies running benchmarks are running ones on internal data, which just defeats the whole point of gaming these research benchmarks.