|
|
|
|
|
by jimfleming
3225 days ago
|
|
That is not a conclusion that can be drawn from the findings in the paper. While the models they evaluate can achieve zero training error on random labels, the test error is obviously not zero: it doesn't generalize at all. However, training on real labels often finds solutions which can generalize quite well. A better way to summarize the central question of this paper would be: "Why is it that a large-parameter model trained with gradient descent on real data _could_ just memorize all of the training data (it has the capacity) yet finds solutions which generalize well to an unseen test set?" To say that deep learning is _just_ memorizing its training data would be incorrect. We have empirical evidence to the contrary and this paper is part of that evidence. |
|