|
|
|
|
|
by chillee
3100 days ago
|
|
That's why I said high dimensional neural networks. There's been a lot of literature explaining why local minima aren't a problem in very high dimension loss surfaces. Check any of the literature on this subject:
https://arxiv.org/abs/1611.06310v2 https://arxiv.org/abs/1406.2572 Local minima are something that people thought was gonna be a problem, especially back in the 2000s. They played around with small neural nets on toy examples such as yours, and thought it was intractable. It's the entire reason why neural nets fell out of the fashion in the early 2000s, and people moved towards techniques like SVM. These toy examples don't generalize to high dimensions, and if you take a look at the literature, you'll see that the consensus agrees with my statement. |
|
Maybe with a billion neurons, just by random chance some of them would correspond to the correct algorithm and get reinforced by backprop. But very few NNs have layers larger than a thousand neurons. Because the cost of layers that big grows quadratically. And the chance of random weights finding the solution decreases exponentially.
One of the biggest reasons things like stochastic gradient descent, and dropout are used is because they break local minimas.