Hacker News new | ask | show | jobs
by shoyer 1658 days ago
> Neural networks need completely different optimisation methods, and there is no practically useful application of any of the Newton or Quasi-Newton methods for their optimisation.

I don't think this is quite fair. There are several variations of 2nd order methods, notably KFAC and Shampoo, that seem to quite effective for large-scale neural network training, e.g., see the intro of this paper for an overview: https://openreview.net/forum?id=-t9LPHRYKmi

2 comments

I checked the paper. The 2nd order method actually achieves 10 times worse minimum (it's much worse at minimisation), and the reason the results are better is because the network overfits less (Figure 3). The reviewers should have caught this!
One of the authors is Donald Goldfarb, who is the G in BFGS, so maybe they are onto something. But I'm always suspicious if the tests shown in a paper are fair.