| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sytelus 2780 days ago
	There is bit of difference between fitting dataset to some convenient parameterized function vs finding global minima of non-convex function. Also, paper claims that this can be done in polynomial time. > The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet).

1 comments

There is bit of difference between fitting dataset to some convenient parameterized function vs finding global minima of non-convex function

What's the difference? Any point where the loss is zero is global minimum.