| 1. > It's theoretically impossible to guarantee a convergence to global optima using gradient descent if the function is non-convex. This is false. See, e.g., [0][1]. 2. I'm not really sure what the question is here. 3. If your loss is bounded from below (it is a square norm) by 0 and you achieve 0 loss, this means that 0 is a global optimum, since, by definition, no other objective value can be smaller than this number. --- [0] Theorem A.2 in Udell's Generalized Low-Rank models paper https://arxiv.org/pdf/1410.0342.pdf [1] B&V Convex Optimization (https://web.stanford.edu/~boyd/cvxbook/), Appendix B.1. In fact, I can't find the reference right now, but you can easily prove that GD with an appropriate step-size converges to a global optimum on this problem when initialized at (0,0), even though the problem is non-convex. |
I think the OP intended and should have written:
"It's theoretically impossible to guarantee a convergence to global optima using gradient descent for an arbitrary non-convex function."
For example consider the function f(x)=sin^2(pi * x)+sin^2(pi * N/x) this function has multiple global minima at the divisors of N, where it is f(x)==0, if x or N/x is non-integer, it is guaranteed to be positive...
I am not taking a stance on if gradient descent does or does not guarantee finding global minima and is thus able to factorize cryptographic grade RSA products of primes, but the claim does appear to imply it.
Edit: the multiply symbols changed some cursive