|
|
|
|
|
by ssivark
2198 days ago
|
|
Yes, but we’re going to be restricted to O(L) final accuracy no matter what, for gradient descent (we could choose second order optimizations, etc, but that’s an orthogonal point — we’re happy to get within an epsilon ball of the answer). |
|
This is not, in general, true for smooth functions so long as L is small enough (you can reach arbitrary accuracy with GD if L is smaller than ~ the reciprocal of the Lipchitz constant of a differentiable objective function but it need not be arbitrarily small).