| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stiff 4171 days ago
	If you use performance on the test set for model selection, this is not true. It follows from simple probabilistic reasoning, the more models you try the higher the chance one will score well on both the training set and the test set by "luck", and this is especially true with small datasets. In fact it is a best practice to use a separate validation set for model selection and use the test set only for final performance evaluation, see e.g. the answer to this question: http://stats.stackexchange.com/questions/9357/why-only-three...