Hacker News new | ask | show | jobs
by 10000truths 743 days ago
Is "leakage" just another term for overfitting?
3 comments

I think a popular example of leakage would be that of a tank recognition AI that perfectly handles training/testing data but fails in real use, because all the tanks of one country happen to have a tree in the background, while those of the other do not, effectively leaking the image label and making the model look for a tree instead of the tank. Even if you trained less or used fewer parameters, it'd still go for the easiest route of trying to detect features of a tree. You'd have to change the training data.
No usually it means the data that you intend to test the model on was accidentally used to train the model. There are more complex scenarios where you get leakage without actually showing the model the test examples. Where you have features that have future information in them that you won't have at actual inference time.

So usually it ends up in overfitting, but is more about having information at training time that it shouldn't.

These are two different definitions. Can someone please disambiguate?