| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by llm_trw 482 days ago
	In general yes, bench mark pollution is a big problem and why only dynamic benchmarks matter.

1 comments

This is true, but how would pollution work for a benchmark designed to test hallucinations?

A dataset of labelled answers that are hallucinations and not hallucinations are published based on the benchmark as part of a paper.

People _seriously_ underestimate just how much stuff is online and how much impact it can have on training.