Hacker News new | ask | show | jobs
by currymj 2733 days ago
As others have said, you don't actually want the global optimum of a neural network because that would be terrible overfitting. There is some evidence that architectural tricks (like ResNet) that empirically help performance are making the loss landscape "more convex", though.

https://arxiv.org/pdf/1712.09913.pdf