Hacker News new | ask | show | jobs
by eru 1403 days ago
> Once or many matters because if you just examine each data point one, there can be no overfitting in the traditional sense.

What makes you so sure? If my algorithm was literally just storing stuff in a hashtable to look up later, you'd get overfitting from a single exposure.

1 comments

Well gradient descent doesn't do that. And the models, while big in terms of parameter data, are not nearly big enough to actually store all the training data.

Think of it in terms of updating beliefs about the target distribution. With backpropagation, you predict based on the input, and update your beliefs according to how wrong you were. So in a sense it's unsound to re-use data - your beliefs already incorporate them! And traditional overfitting is all that - it's when you use up all the information in your training data. This was many people's objection to neural nets (and I thought it was a good objection at the time, and thought myself that the future lay with more "sound" methods, which performed better on most metrics anyway at the time, rather than with dodgy biomimicry which wasn't really even similar to biological brains at all).

But yes, there are other types of overfitting if you want to get philosophical about it. It's just that the one I and everyone used to worry about, from training too much on your data, just isn't important anymore. And most of those clever principled and less-principled regularization methods just don't matter anymore!