Hacker News new | ask | show | jobs
by T_D_K 3218 days ago
I'm getting stuck trying to understand the equations in assumptions 1 and 2. Can anyone point me to a resource that explains the idea behind them? Wikipedia is a bit terse, and I'm not having any luck googling for "gradient-lipschitz and hessian-lipschitz" and variations.

On the notation side, am I correct in thinking that "<del>f(x_n) is the partial derivative w.r.t. x_n? And that the elements of the vector x are the parameters against which a "cost function" (f) is computed? But that doesn't seem right. Maybe x_n is a point in R^N, and therefore <del>f(x_n) is the derivative at that point?

2 comments

∇f(x_1) is the gradient of f evaluated at x_1, a point in R^N.

The first equation indicates that for any two points in R^N, the maximum norm of the difference in gradient is less than a constant times the distance between the points.

The keyword to google for is just "Lipschitz".

Ok. I stared at it long enough, and I think I understand. Being Lipschitz-continuous means (in a non-rigorous way?) that a the gradient / slope of a function has an upper bound. And Hessian-Lipschitz means the same, but for the second derivative / hessian.

So, f(x) = x^2 is not Lipschitz-continuous (because the slope gets arbitrarily large), but something like f(x) = sin(x) is Lipschitz-continuous because the slope never exceeds some upper bound.

Funny how trying to write down the question gives the brain the kick it needs sometimes :)

That's correct.

In some situations, it's enough to prove an equation is Lipschitz-continuous on a range. Example, y=x^2 is lipschitz continuous on x=[0,1].

For assumption 1:

For every two points, the norm of the gradient is bounded by some constant times the absolute values/normed values between the two points. So the slope is bounded.

For assumption 2:

Same as assumption one, but true for the the second derivative to. So the change in slope (acceleration) is bounded. So no starts, no stops.