Hacker News new | ask | show | jobs
by mjw 3681 days ago
My main quibble from this paper is:

> For deeper networks, Corollary 2.4 states that there exist “bad” saddle points in the sense that the Hessian at the point has no negative eigenvalue.

To me these sound just as bad as local minima. Also I don't think it's standard to call something a saddle point unless the Hessian has negative as well as positive eigenvalues. Otherwise there's no "saddle", more something like a valley or plateau.

They claim that these can be escaped with some peturbation:

> From the proof of Theorem 2.3, we see that some perturbation is sufficient to escape such bad saddle points.

I haven't read through the (long!) proof in detail but it doesn't seem obvious to me why these would be any easier to escape via peturbation than a local minimum would be, and I think this could use some extra explanation as it seems like an important point for the result to be useful. Did anyone figure this bit out?

1 comments

A saddle is a critical point that's not a local extremum--the Hessian could just be zero, for example, like x^4-y^4 at (0,0).
Ah yep, true. I'd forgotten you can still get the saddle effect from higher-order derivatives, the Hessian eigenvalues aren't enough to characterise it.

I was thinking of examples like (x-y)^2 at zero, although I guess that's still a local minimum, just not a unique local minimum in any neighbourhood.