Hacker News new | ask | show | jobs
by XenophileJKO 743 days ago
No usually it means the data that you intend to test the model on was accidentally used to train the model. There are more complex scenarios where you get leakage without actually showing the model the test examples. Where you have features that have future information in them that you won't have at actual inference time.

So usually it ends up in overfitting, but is more about having information at training time that it shouldn't.