|
|
|
|
|
by fwilliams
3093 days ago
|
|
There is literature on Quasi-Newton and Krylov Subspace methods for training Neural Networks. For example, https://dl.acm.org/citation.cfm?id=3104516. I think the primary reason that such methods are not used much in practice is memory and computational cost: each function evaluation is expensive and you need to solve a very large system at every iteration. Also to reply to a sibling comment, you can add momentum and step length adjustments to second-order methods in much the same way as in steepest-descent to help escape saddles. The only difference is how the descent direction is chosen for the optimization. |
|