Hacker News new | ask | show | jobs
by blackbear_ 2074 days ago
These two papers are not necessarily contradicting each other, but perhaps my description was a bit sloppy.

Sagun et al. (and derivative works) only focus on the Hessian on the trajectory followed by gradient descent, while Li et al. give a broader look at the loss surface as a whole.