DatBench fixes VLM evals: 70% blindly solvable, 42% mislabeled, 35% prod gap

Y	Hacker News new \| ask \| show \| jobs

	DatBench fixes VLM evals: 70% blindly solvable, 42% mislabeled, 35% prod gap (datologyai.com)
	5 points by hurrycane 165 days ago