Hacker News new | ask | show | jobs
by WithinReason 1657 days ago
I checked the paper. The 2nd order method actually achieves 10 times worse minimum (it's much worse at minimisation), and the reason the results are better is because the network overfits less (Figure 3). The reviewers should have caught this!