|
|
|
|
|
by PartiallyTyped
1526 days ago
|
|
Well, quadratic convergence usually requires the Hessian, or an approximation of it, and that's difficult to get in deep learning due to memory constrains, and difficulty computing second order derivatives. Computing the derivatives is not very difficult with e.g. Jax, but ... you get back to the memory issue. The Hessian is a square matrix, so in Deep Learning, if we have a million of parameters, then the Hessian is a 1 trillion square matrix... |
|