Hacker News new | ask | show | jobs
DatBench fixes VLM evals: 70% blindly solvable, 42% mislabeled, 35% prod gap (datologyai.com)
5 points by hurrycane 165 days ago