Hacker News new | ask | show | jobs
by nabakin 489 days ago
Automated benchmarks are still very useful. Just less so when the LLM is trained in a way to overfit to them, which is why we have to be careful with random people and the claims they make. Human evaluation is the gold standard, but even it has issues.
1 comments

The question is how do you train your LLMs to not 'cheat'?

Imagine you have an exam coming up, and the set of questions leaks - how do you prepare for the exam then?

Memorizing the test problems would be obviously problematic, but maybe practicing the problems that appear on the exam would be less so, or just giving extra attention to the topics that will come up would be even less like cheating.

The more honest approach you choose, the more indicative your training would be of exam results but everybody decides how much cheating they allow for themselves, which makes it a test of the honesty not the skill of the student.

I think the only way is to check your dataset for the benchmark leak and remove it before training, but (as you say) that's assuming an honest actor is training the LLM, going against the incentives of leaving the benchmark leak in the training data. Even then, a benchmark leak can make it through those checks.

I think it would be interesting to create a dynamic benchmark. For example, a benchmark which uses math and a random value determined at evaluation for the answer. The correct answer would be different for each run. Theoretically, training on it wouldn't help beat the benchmark because the random value would change the answer. Maybe this has already been done.