Hacker News new | ask | show | jobs
by jimfleming 3225 days ago
That is not a conclusion that can be drawn from the findings in the paper. While the models they evaluate can achieve zero training error on random labels, the test error is obviously not zero: it doesn't generalize at all. However, training on real labels often finds solutions which can generalize quite well.

A better way to summarize the central question of this paper would be: "Why is it that a large-parameter model trained with gradient descent on real data _could_ just memorize all of the training data (it has the capacity) yet finds solutions which generalize well to an unseen test set?"

To say that deep learning is _just_ memorizing its training data would be incorrect. We have empirical evidence to the contrary and this paper is part of that evidence.

1 comments

But we also have empirical evidence that they generalize incredible poorly, namely the existence of imperceptible (adversarial) perturbations which can transfer across images and networks and are catastrophically misclassified.
Adversarial examples don't really support the claim that deep models are just memorizing examples. If they were, they wouldn't generalize to unseen examples at all. However, the human brain is also susceptible to adversarial examples (e.g. optical illusions). Yet human brains still generalize quite well. Likewise, deep learning can both suffer from adversarial examples and generalize well.

Generalization is a multi-axis scale, not a switch: you can have more or less generalization in many different dimensions. Being terrible at adversarial examples just means that axis is weak.

Generalization error is a number. You can debate the probability space over which it should be computed, but it exists in one dimension. Otherwise statements like "generalize well" make no sense. Anyways, I'm not aware of any optical illusion that can make the brain confuse a house cat with a door knob. Yet it is apparently possible to make any one of a number of neural nets do so, using the same technique and with only subtle, imperceptible changes to the input. So they cannot possibly be learning any sort of intrinsic representation of these objects. I'm guilty of being facetious in using the word "just", since of course deep nets (and all other serious ML algorithms I know of) are able to generalize to an extent. What's not clear to me this represents some paradigm shift in AI, as is often claimed, or if it's simply the consequence of fitting a hugely overparameterized function approximator to a web-scale amount of training data.