Hacker News new | ask | show | jobs
by Salgat 1495 days ago
For higher feature counts the real concern is saddle points rather than minima, where the gradient is so small that you barely move at all each iteration and get "stuck".
1 comments

To add here: for a local minimum to occur all those dimensions (or features) need to increase. This is highly unlikely for modern NNs where you have millions of dimensions. If one of the dimensions is going down but the rest up, you have a saddle point. Since you go down only one (or few) dimensions it takes longer.