|
|
|
|
|
by hyperbovine
3225 days ago
|
|
In fact recent research indicates that you can randomly relabel the training examples and the network still achieves zero training error (https://arxiv.org/abs/1611.03530). So it is not "understanding" anything intrinsic or fundamental about the letter "A". Rather, it's just storing training examples somewhere inside of its millions of parameters, which sounds a lot less impressive. |
|
A better way to summarize the central question of this paper would be: "Why is it that a large-parameter model trained with gradient descent on real data _could_ just memorize all of the training data (it has the capacity) yet finds solutions which generalize well to an unseen test set?"
To say that deep learning is _just_ memorizing its training data would be incorrect. We have empirical evidence to the contrary and this paper is part of that evidence.