Hacker News new | ask | show | jobs
by gwern 3128 days ago
Well, they consider RL problems extensively, and as the joke goes, in RL it's OK to overfit to your validation set - if you can.

As for regular supervised learning: it's no worse than, say, early stopping based on validation scores. It should be wrong but in practice NNs generalize anyway, and since this paper implies that Google Brain & DM are doing this hyperparameter optimization routinely now for everything, I figure that they would have noticed any overfitting problems by now (either when the methods fail to outperform on one of Google's private internal huge databases, or when they rolled outth the translator).