Hacker News new | ask | show | jobs
by fwilliams 3093 days ago
There is literature on Quasi-Newton and Krylov Subspace methods for training Neural Networks. For example, https://dl.acm.org/citation.cfm?id=3104516.

I think the primary reason that such methods are not used much in practice is memory and computational cost: each function evaluation is expensive and you need to solve a very large system at every iteration.

Also to reply to a sibling comment, you can add momentum and step length adjustments to second-order methods in much the same way as in steepest-descent to help escape saddles. The only difference is how the descent direction is chosen for the optimization.

1 comments

This is correct - second order methods are great in theory, but they are generally computationally prohibitive for high dimensional problems.