Hacker News new | ask | show | jobs
by blahblah3 3532 days ago
This can be mitigated by evaluating all the models on a hold-out test set (similar to what kaggle does and what was done in the netflix prize). The multiple comparisons problem is also mitigated by the fact that the models wont be completely random, there will likely be some positive correlation between them.

edit: Also, by hoeffding's inequality the number of training examples needed for a given level of confidence is only logarithmic in the number of models (even assuming they are independent). See page 6 here: http://cs229.stanford.edu/notes/cs229-notes4.pdf