Hacker News new | ask | show | jobs
by famouswaffles 1070 days ago
Yeah and I'm saying I don't believe it.

I don't know what you're talking about. GPT-4 is the best model out there by significant margin. That's coming from personal usage not benchmarks. A 10% drop in traffic the first month students are out of school is not "losing users quickly" lol.

ChatGPT didn't gain public use waving benchmarks around. We didn't even know what they were until GPT-4's release. The vast majority of its users know nothing about any of that or care. So your first sentence is just kind of nonsensical.

Anyway whatever. If that's what you believe then that's what you believe. Just realize you have nothing to back it up.

1 comments

Nobody has any evidence here. I'm saying that the incentives are such that the null hypothesis should be the opposite of what you think.
Your entire argument, Your incentives hinge on "OpenAI's main product is "LLM that benchmarks the best."" which is a particularly silly assertion when Open AI did not release benchmark evaluatios for 3.5 for months. Not when the product was released. Not even when the API was released.
You don't have to release official numbers to run benchmarks. You also don't have to own the LLM to run benchmarks. Within hours of GPT-4's emergence, many benchmarks had been run.
You said their main product was "LLMs that benchmark the best" like benchmarking was some important aspect of marketing. It's not. That's fact. You can't say it's this hugely important thing and conveniently leave out they make near zero effort to do anything with it.

Basically the only people running benchmarks that could have been gamed on GPT-4 were other researchers, not companies, customers or users looking to use a product.

Normal users are certainly not running benchmarks and companies running benchmarks are running ones on internal data, which just defeats the whole point of gaming these research benchmarks.