|
|
|
|
|
by andbberger
2653 days ago
|
|
From the paper > We propose Adam, a method for efficient stochastic optimization that only requires first-order gradients
with little memory requirement. The method computes individual adaptive learning rates for
different parameters from estimates of first and second moments of the gradients Most of the popular variants of SGD use approximations of the hessian in one way or another |
|
Furthermore, I don't think by "moment of the gradients" they actually mean second derivatives.
Also from the paper: We introduce Adam, an algorithm for first-order gradient-based optimization ofstochastic objective functions...
It's written right in the abstract that the authors consider it a first-order method.