Hacker News new | ask | show | jobs
by iambateman 982 days ago
This really is a good article, and is seriously researched. But the conclusion in the headline - “AI hype is built on flawed test scores” - feels like a poor summary of the article.

It _is_ correct to say that an LLM is not ready to be a medical doctor, even if it can pass the test.

But I think a better conclusion is that test scores don’t help us understand LLM capabilities like we think they do.

Using a human test for an LLM is like measuring a car’s “muscles” and calling it horsepower. They’re just different.

But the AI hype is justified, even if we struggle to measure it.