| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cjtrowbridge 665 days ago
	I love how they include a helpful chart that shows this model scores worse than everything else.

5 comments

kibibu 665 days ago

Am I looking at the wrong table? It dominates everything on visual interpretation benchmarks.

Edit: specifically ocrbench and VQAv2

link

Der_Einzige 665 days ago

It's not that bad, and I'd much rather that they be honest instead of lying like everyone else does.

link

butterfly42069 665 days ago

All jokes aside (and that did make me laugh) at least they're not training just to hit the benchmarks, which seem to be more meaningless as a quality indicator with each passing day.

link

miffy900 665 days ago

I see at a few models (3 models in MMMU) that score lower than Nvidia's. But putting that aside, they at least get points for apparent objectivity. At least they probably aren't fudging numbers.

link

GaggiX 665 days ago

Well but it actually doesn't, unless you're looking only at MMMU.

link

dr_kiszonka 665 days ago

Exactly. On some benchmarks it is close to or better than GPT 4o.

I wonder if one of the reasons they released it was to respond to OpenAI's plans to enter the chipmaking market.

link