|
|
|
|
|
by ScottBurson
3017 days ago
|
|
I think it's a combination of (1) and (2): in a high-dimensional space, for a local optimum to be convex it has to be convex in every dimension, the probability of which falls off exponentially in the number of dimensions. So in practice, they're all saddles. |
|
Further, the mindboggling size of the high-dimensional spaces make me all but guaranteed that not a single non-trivial neural network made by homo sapiens has ever been in global maxima. But I have nothing but my hunch on this.