| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thentherewere2 1490 days ago
	This is the case most contemporary neural networks as well. It turns out for many domains, a "good" local minima generalizes well across many tasks.

1 comments

dekhn 1490 days ago

Huh. I talked to some experts and they told me NN loss functions are bowl-shaped and have single minima, but those minima take a very long time to navigate to in high dimensional spaces.

link

Salgat 1490 days ago

For higher feature counts the real concern is saddle points rather than minima, where the gradient is so small that you barely move at all each iteration and get "stuck".

link

timomo 1490 days ago

To add here: for a local minimum to occur all those dimensions (or features) need to increase. This is highly unlikely for modern NNs where you have millions of dimensions. If one of the dimensions is going down but the rest up, you have a saddle point. Since you go down only one (or few) dimensions it takes longer.

link