|
|
|
|
|
by fwilliams
2776 days ago
|
|
It's worth noting that the primary result of this paper has only to do with the error on the training data under empirical risk minimization. Zero training error =/= a model that generalizes. For any optimization problem, you can always add enough parameters to achieve zero error on a problem over a finite training set (imagine introducing enough variables to fully memorize the map from inputs to labels). The major contribution of the work is showing that ResNet needs a number of parameters which is polynomial in the dataset size to converge to a global optimum in contrast to traditional neural nets which require an exponential number of parameters. |
|
> The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet).