Hacker News new | ask | show | jobs
by Filligree 1514 days ago
Why are there so few local minima, you mean?

I think it’d have to be related to the huge number of dimensions it works on. But I have no idea how I’d even begin to prove that.

1 comments

Its not even certain that they are few. Whats rather unsettling is that with these local moves of SGD the parameters settle on a good enough local minima in spite of the fact that we know that many local minima exists that have zero or near zero training loss. There are glimmers or insight here and there but the thing is yet to be fully understood