|
|
|
|
|
by mjw
3681 days ago
|
|
My main quibble from this paper is: > For deeper networks, Corollary 2.4 states that there exist “bad” saddle points in the sense that the Hessian at the point has no negative eigenvalue. To me these sound just as bad as local minima. Also I don't think it's standard to call something a saddle point unless the Hessian has negative as well as positive eigenvalues. Otherwise there's no "saddle", more something like a valley or plateau. They claim that these can be escaped with some peturbation: > From the proof of Theorem 2.3, we see that some perturbation is sufficient to escape such bad saddle points. I haven't read through the (long!) proof in detail but it doesn't seem obvious to me why these would be any easier to escape via peturbation than a local minimum would be, and I think this could use some extra explanation as it seems like an important point for the result to be useful. Did anyone figure this bit out? |
|