| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dodslaser 581 days ago
	What else should you train for? If the benchmark dosn't represent real world scenarios, isn't that a problem with the benchmark rather than the model?

4 comments

isoprophlex 581 days ago

If your benchmark covers all possible programming tasks then you dont need an llm, you need search over your benchmark.

Hypothetically let's say the benchmark contains "test divisibility of this integer by n" for all n of the form 3x+1. An extremely overfit llm won't be able to code divisibility for all n not of the form 3x+1, and your benchmark will never tell.

link

YetAnotherNick 580 days ago

No, because solving a well defined problem with well defined right or wrong is generally not what people use llm for. Most of the times my query to llm is underspecified, and lot of time I figure out the problem when chatting with LLM. And benchmark by definition only measures just right/wrong answer.

link

LeoPanthera 580 days ago

This is called Goodhart's law, who said: "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes."

But in modern usage it is often rephrased to: "When a measure becomes a target, it ceases to be a good measure"

https://en.wikipedia.org/wiki/Goodhart%27s_law

link

exitb 581 days ago

Overfitting is a concern.

link

ithkuil 580 days ago

It's more subtle than this:

https://en.m.wikipedia.org/wiki/Training,_validation,_and_te...

link