|
|
|
|
|
by muppet_frog
2075 days ago
|
|
...hmm, that was counter to my understanding (limited though it may be...) which was partially formed by this paper:
https://arxiv.org/abs/1712.09913 TLDR - loss landscapes are nasty, but you can tame them with skip connections. |
|
Sagun et al. (and derivative works) only focus on the Hessian on the trajectory followed by gradient descent, while Li et al. give a broader look at the loss surface as a whole.