Hacker News new | ask | show | jobs
by nshm 2779 days ago
The contribution of the paper is the estimation of the convergence speed and number of parameters in neural network, that seems a valuable point.
1 comments

The paper states for the main result:

>In this section, we show gradient descent with a constant positive step size converges to the global minimum with a linear rate.

This is rather ambiguous: it sounds like it guarantees "it WILL converge to the global minimum, btw at a linear rate" but I suspect they are really saying "IF it converges to the global minimum, THEN it will converge at a linear rate" in a way to purpousefully sound like the first statement.

could you comment on if GD does or does not find the global minimum of the integer factorization cost function in the following comment?

https://news.ycombinator.com/item?id=18439287

Another sin they have is that they write "with high probability" in the theorem but they do not strictly define that in the main section of the paper. If you look into appendix you'll see that they guarantee the probability 1 - \delta and delta affects the convergence. It means that if you add many many parameters then you just need a couple attempts to converge well. So "IF it converges" is becoming much better "VERY OFTEN it converges". Sorry, no real passion to read the paper in details, so this is just an intuition from a quick look.