All jokes aside (and that did make me laugh) at least they're not training just to hit the benchmarks, which seem to be more meaningless as a quality indicator with each passing day.
I see at a few models (3 models in MMMU) that score lower than Nvidia's. But putting that aside, they at least get points for apparent objectivity. At least they probably aren't fudging numbers.
Edit: specifically ocrbench and VQAv2