|
|
|
|
|
by rotskoff
2240 days ago
|
|
As many have pointed out, this rediscovers Newton's method. The reason that this type of approach with a "hessian pre-conditioning" is not widely used in practice is that computing the hessian is costly. Avoiding that additional computation is the idea underlying "quasi-Newton" methods like BFGS and (more loosely) popular methods like Adagrad. |
|