Hacker News new | ask | show | jobs
by brookst 482 days ago
This is true, but how would pollution work for a benchmark designed to test hallucinations?
1 comments

A dataset of labelled answers that are hallucinations and not hallucinations are published based on the benchmark as part of a paper.

People _seriously_ underestimate just how much stuff is online and how much impact it can have on training.