Hacker News new | ask | show | jobs
by YeGoblynQueenne 1436 days ago
As an addendum, note that training for a competition does not eliminate this overfitting to the test set. Most competitions make the test set instances available, though not their labels. Many restrict the number of submissions that can be made but usually accept several. There's even a bit of jargon regarding how to game this, it's called "hill climbing the test set" (really, it's hill climbing the performance on the test set, i.e. it's the accuracy on the test set that's optimised). Here's an actual how-to:

https://blockgeni.com/how-to-hill-climb-the-test-set-for-mac...

One benchmark I know where the test set is completely hidden is François Chollet's ARC dataset, and that's done precisely to preclude overfitting to the test set.