Hacker News new | ask | show | jobs
by ssivark 2198 days ago
Yes, but we’re going to be restricted to O(L) final accuracy no matter what, for gradient descent (we could choose second order optimizations, etc, but that’s an orthogonal point — we’re happy to get within an epsilon ball of the answer).
1 comments

> Yes, but we’re going to be restricted to O(L) final accuracy no matter what

This is not, in general, true for smooth functions so long as L is small enough (you can reach arbitrary accuracy with GD if L is smaller than ~ the reciprocal of the Lipchitz constant of a differentiable objective function but it need not be arbitrarily small).