Hacker News new | ask | show | jobs
by BoorishBears 207 days ago
The gold standard for cheating on a benchmark is SFT and ignoring memorization. That's why the standard for quickly testing for benchmark contamination has always been to switch out specifics of the task.

Like replacing named concepts with nonsense words in reasoning benchmarks.

1 comments

Yes. But "the gold standard" just means "the most natural, easy and dumb way".