Hacker News new | ask | show | jobs
by fwilliams 2776 days ago
It's worth noting that the primary result of this paper has only to do with the error on the training data under empirical risk minimization. Zero training error =/= a model that generalizes. For any optimization problem, you can always add enough parameters to achieve zero error on a problem over a finite training set (imagine introducing enough variables to fully memorize the map from inputs to labels).

The major contribution of the work is showing that ResNet needs a number of parameters which is polynomial in the dataset size to converge to a global optimum in contrast to traditional neural nets which require an exponential number of parameters.

2 comments

There is bit of difference between fitting dataset to some convenient parameterized function vs finding global minima of non-convex function. Also, paper claims that this can be done in polynomial time.

> The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet).

There is bit of difference between fitting dataset to some convenient parameterized function vs finding global minima of non-convex function

What's the difference? Any point where the loss is zero is global minimum.

The way to tackle the problem you state would be finding similar bounds with regularization.