|
|
|
|
|
by jefft255
2653 days ago
|
|
Not to be peculiar, but I don't know if approximating the hessian using the gradient counts as a second order method. I was talking about "full-blown" second order methods where you compute de hessian through AD. Furthermore, I don't think by "moment of the gradients" they actually mean second derivatives. Also from the paper: We introduce Adam, an algorithm for first-order gradient-based optimization ofstochastic objective functions... It's written right in the abstract that the authors consider it a first-order method. |
|