Hacker News new | ask | show | jobs
by llm_trw 482 days ago
In general yes, bench mark pollution is a big problem and why only dynamic benchmarks matter.
1 comments

This is true, but how would pollution work for a benchmark designed to test hallucinations?
A dataset of labelled answers that are hallucinations and not hallucinations are published based on the benchmark as part of a paper.

People _seriously_ underestimate just how much stuff is online and how much impact it can have on training.