| HN Mirror

They aren't common in deep learning, but if you look to estimation problems like odometry, optimal control, and calibration, the typical approach is to build a least squares estimator that optimizes with a gauss-newton approximation to the Hessian, or other quasi-newton methods. Gradient descent comparatively exhibits very slow convergence in these cases, especially when there is a large condition number. In the case of an actual quadratic loss function, it can (by definition) be solved in one iteration if you have the Hessian. However, getting it efficiently within most learning frameworks is difficult, as they primarily only compute VJPs or HVPs.