| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ACCount37 264 days ago
	That's because they are as close to "object measure capabilities" as anything we're ever going to get. Without benchmarks, you're down to evaluating model performance based on vibes and vibes only, which plain sucks. With benchmarks, you have numbers that correlate to capabilities somewhat.

2 comments

achierius 264 days ago

That's assuming these benchmarks are the best we're ever going to get, which they clearly aren't. There's a lot to improve even without radical changes to how things are done.

link

ACCount37 264 days ago

The assumption I make is that "better benchmarks" are going to be 5% better, not 5000% better. LLMs are getting better capabilities faster than the benchmarks get better at measuring them accurately.

So, yes, we just aren't going to get anything that's radically better. Just more of the same, and some benchmarks that are less bad. Which is still good. But don't expect a Benchmark Revolution when everyone suddenly realizes just how Abjectly Terrible the current benchmarks are, and gets New Much Better Benchmarks to replace them with. The advances are going to be incremental, unimpressive, and meaningful only in aggregate.

link

scuff3d 264 days ago

So because there isn't a better measure it's okay that tech companies effectively lie and treat these benchmarks like they mean more then they actually do?

link

ACCount37 264 days ago

Sorry, pal, but if benchmarks were to disagree with opinions of a bunch of users saying "tech companies bad"? I'd side with benchmarks at least 9 times out of 10.

link

scuff3d 264 days ago

How does that have anything to do with what we're talking about?

link

ACCount37 263 days ago

What that has to do is: your "tech companies are bad for using literally the best tool we have for measuring AI capabilities when talking about AI capablities" take is a very bad take.

It's like you wanted to say "tech companies are bad", and the rest is just window dressing.

link