Hacker News new | ask | show | jobs
by muppet_frog 2069 days ago
This paper makes the points that it's the saddles and not local minima that are the problem: https://arxiv.org/abs/1406.2572 It was the basis for adding 'momentum' to optimizers - so that you could skate across the saddles.