|
|
|
|
|
by BoorishBears
372 days ago
|
|
The benchmarks also claim random 32B parameter models beat Claude 4 at coding, so we know just how much they matter. It should be obvious to anyone who with a cursory interest in model training, you can't trust benchmarks unless they're fully private black-boxes. If you can get even a hint of the shape of the questions on a benchmark, it's trivial to synthesize massive amounts of data that help you beat the benchmark. And given the nature of funding right now, you're almost silly not to do it: it's not cheating, it's "demonstrably improving your performance at the downstream task" |
|