Hacker News new | ask | show | jobs
by dustyleary 2447 days ago
There is a problem with our intuition about gradient descent that blinds us, a little bit.

For a local minimum to exist, that means that the function we're looking at has to have a local minimum when sliced in every dimension.

In our intuition, this is relatively common, because we usually imagine 2d manifolds embedded in 3-space (ie, "hill climbing"). And when there's only 2 dimensions, it's not hard to have a "bowl", where the function curves upward in both dimensions.

In fact, a "saddle point", where one of the dimensions curves up and the other curves down, feels "unlikely" to most peoples' intuition, because we don't often run into physical objects with that shape.

But, saddle points in higher dimensions "should be" far more common than local minima. You only need one dimension to curve downward to have a "hole" where the "water" can escape and continue flowing downhill.

Gradient descent easily gets stuck in 2 dimensions, and so we are often surprised by how well it performs in higher dimensions.

1 comments

Yes, saddle points are far more common than local minima in high dimension. Unfortunately they're really good at slowing down naive gradient descent...