Hacker News new | ask | show | jobs
by Bill_Dimm 4424 days ago
Cross-validation (actually, this is mentioned toward the end of the article). Basically, fit the the classifier with a subset of the data and test the predictions on the remainder. Predictions for out-of-sample data will be poor if you have overfitting.

http://en.wikipedia.org/wiki/Cross-validation_%28statistics%...

1 comments

yeah, this one is quite intuitive, but it reduces the training sample size.
Once you find the optimal parameters, you can then train the model again on the entire dataset.