| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tngranados 252 days ago
	The point of benchmarking that is checking for hallucinations and overfitting. Does the model actually check the picture to count the legs or does it just see it's a dog and answer four because it knows dogs usually has four legs? It's a perfectly valid benchmark and very telling.

1 comments

column 252 days ago

Very telling of what?

link

nsingh2 251 days ago

Telling of where the boundary of competence is for these models. And to show that these models aren't doing what most expect them to be doing, i.e. not counting legs, and maybe instead inferring information based on the overall image (dogs usually have 4 legs) to the detriment of find grained or out-of-distribution tasks.

link