|
|
|
|
|
by sytelus
2780 days ago
|
|
There is bit of difference between fitting dataset to some convenient parameterized function vs finding global minima of non-convex function. Also, paper claims that this can be done in polynomial time. > The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). |
|
What's the difference? Any point where the loss is zero is global minimum.