| HN Mirror

Splitting the learning set in training and validation sets is very important. The validation set is used in order to select the solutions which have generalized (or understood) the problem for real. When you use all the knowledge for training, the algorithm can overfit, providing a solution that has a great performance on the training examples but has poor performance when you use it, for real, on unseen text. Splitting in training and validation leads to better solutions.