Hacker News new | ask | show | jobs
by rotskoff 2240 days ago
As many have pointed out, this rediscovers Newton's method. The reason that this type of approach with a "hessian pre-conditioning" is not widely used in practice is that computing the hessian is costly. Avoiding that additional computation is the idea underlying "quasi-Newton" methods like BFGS and (more loosely) popular methods like Adagrad.