Hacker News new | ask | show | jobs
by beernet 23 days ago
> Ultimately I think the only way you can trust benchmarks is if you build them yourself and keep them secret from the AI labs.

I agree.

At the same time, one of the first things we see in the HN comments when a new model is released are pelicans on a bike. Makes you wonder where the priorities of the AI "community" lie when karma farming is the main motivation for model "evaluation".