|
|
|
|
|
by T_D_K
3218 days ago
|
|
I'm getting stuck trying to understand the equations in assumptions 1 and 2. Can anyone point me to a resource that explains the idea behind them? Wikipedia is a bit terse, and I'm not having any luck googling for "gradient-lipschitz and hessian-lipschitz" and variations. On the notation side, am I correct in thinking that "<del>f(x_n) is the partial derivative w.r.t. x_n? And that the elements of the vector x are the parameters against which a "cost function" (f) is computed? But that doesn't seem right. Maybe x_n is a point in R^N, and therefore <del>f(x_n) is the derivative at that point? |
|
The first equation indicates that for any two points in R^N, the maximum norm of the difference in gradient is less than a constant times the distance between the points.
The keyword to google for is just "Lipschitz".