Hacker News new | ask | show | jobs
by ntenenz 3286 days ago
Holy data leakage Batman!

Because the data augmentation is performed prior to the random train/val/test split, nearly identical instances may be found in each set. In other words, a 1 degree rotation of training image X may be found in the validation or test set which would artificially inflate performance.