Hacker News new | ask | show | jobs
by dotnet00 743 days ago
I think a popular example of leakage would be that of a tank recognition AI that perfectly handles training/testing data but fails in real use, because all the tanks of one country happen to have a tree in the background, while those of the other do not, effectively leaking the image label and making the model look for a tree instead of the tank. Even if you trained less or used fewer parameters, it'd still go for the easiest route of trying to detect features of a tree. You'd have to change the training data.