| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by beernet 23 days ago

> Ultimately I think the only way you can trust benchmarks is if you build them yourself and keep them secret from the AI labs.

I agree.

At the same time, one of the first things we see in the HN comments when a new model is released are pelicans on a bike. Makes you wonder where the priorities of the AI "community" lie when karma farming is the main motivation for model "evaluation".