Hacker News new | ask | show | jobs
by lancebeet 118 days ago
If benchmarks are fishy, it seems their bias would be to produce better scores than expected for proprietary models, since they have more incentives to game the benchmarks.