Hacker News new | ask | show | jobs
by sytelus 2780 days ago
There is bit of difference between fitting dataset to some convenient parameterized function vs finding global minima of non-convex function. Also, paper claims that this can be done in polynomial time.

> The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet).

1 comments

There is bit of difference between fitting dataset to some convenient parameterized function vs finding global minima of non-convex function

What's the difference? Any point where the loss is zero is global minimum.