| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by D-Machine 119 days ago
	It can be reasonable to be skeptical that advances on benchmarks may be only weakly or even negatively correlated with advances on real-world tasks. I.e. a huge jump on benchmarks might not be perceptible to 99% of users doing 99% of tasks, or some users might even note degradation on specific tasks. This is especially the case when there is some reason to believe most benchmarks are being gamed. Real-world use is what matters, in the end. I'd be surprised if a change this large doesn't translate to something noticeable in general, but the skepticism is not unreasonable here.

1 comments

throw310822 118 days ago

The GP comment is not skeptical of the jump in benchmark scores reported by one particular LLM. It's skeptical of machine intelligence in general, claims that there's no value in comparing their performances with those of human beings, and accuses those who disagree with this take of "hubris and grift". This has nothing to do with any form or reasonable skepticism.

link