|
|
|
|
|
by beernet
23 days ago
|
|
> Ultimately I think the only way you can trust benchmarks is if you build them yourself and keep them secret from the AI labs. I agree. At the same time, one of the first things we see in the HN comments when a new model is released are pelicans on a bike. Makes you wonder where the priorities of the AI "community" lie when karma farming is the main motivation for model "evaluation". |
|