| HN Mirror

Kaggle competitions rarely produce interesting algorithmic results.

But I highly encourage you to read the winners' solutions. They are full of clever data insight, augmentations, regularizations, feature engineering, and preprocessing and postprocessing tricks.

But above all, compared to the academic literature, it's shocking how much time and creativity they spend on validation. Maybe I'm reading the wrong papers, but the flashy new neural architectures rarely even mention their validation setup; Kaggle winners sometimes devote half of their explanation to it. It's part of their secret sauce.

Two personal favorites:

(1) https://www.kaggle.com/c/severstal-steel-defect-detection/di.... The "random defect blackout" was a really clever data augmentation.

(2) https://www.kaggle.com/c/ieee-fraud-detection/discussion/111.... Particularly how they reduced overfitting with adverserial validation. They trained a separate model to distinguish between train and test sets, and then dropped features that ranked highly in feature importance on that model. That's probably a well-known technique in some circles, but I had never seen anything like it before.