| HN Mirror

Usually I develop models with a train/validation/test split, where I'm measuring results on the validation set to decide the appropriate number of epochs to use. Then I burn the test set to evaluate performance. Then I train from scratch on the entire dataset (no split) and I use the same number of epochs to train here. Is this number of epochs optimal when the dataset is different? Of course not. But when you use regularization and other methods to combat overfitting appropriately, your training is not going to be overly sensitive to changes in epoch number anyway.