Hacker News new | ask | show | jobs
by MichealCodes 266 days ago
The benchmarks are not typically ongoing, we do not often see comparisons between week 1 and week 8. Sprinkle a bit of training on the benchmarks in and you can ensure higher scores for the next model. A perfect scam loop to keep the people happy until they wise up.
1 comments

> The benchmarks are not typically ongoing, we do not often see comparisons between week 1 and week 8

You don't need to compare "A (Week 1)" to "A (Week 8)" to be able to show "B (Week 1)" is genuinely x% better than "A (Week 1)".

As I said sprinkle a bit of benchmarks polluting the training and you have your loop. Each iteration will be better at benchmarks if that's the goal and that goal/context reinforces.
Sprinkling in benchmark training isn't a loop, it's just plain cheating. Regardless, not all of these benchmarks are public and, even with mass collusion across the board, it wouldn't make sense only open weight LLMS have been improving.