| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MichealCodes 266 days ago
	The benchmarks are not typically ongoing, we do not often see comparisons between week 1 and week 8. Sprinkle a bit of training on the benchmarks in and you can ensure higher scores for the next model. A perfect scam loop to keep the people happy until they wise up.

1 comments

zamadatix 265 days ago

> The benchmarks are not typically ongoing, we do not often see comparisons between week 1 and week 8

You don't need to compare "A (Week 1)" to "A (Week 8)" to be able to show "B (Week 1)" is genuinely x% better than "A (Week 1)".

link

MichealCodes 265 days ago

As I said sprinkle a bit of benchmarks polluting the training and you have your loop. Each iteration will be better at benchmarks if that's the goal and that goal/context reinforces.

link

zamadatix 265 days ago

Sprinkling in benchmark training isn't a loop, it's just plain cheating. Regardless, not all of these benchmarks are public and, even with mass collusion across the board, it wouldn't make sense only open weight LLMS have been improving.

link