Hacker News new | ask | show | jobs
by jefft255 2653 days ago
Not to be peculiar, but I don't know if approximating the hessian using the gradient counts as a second order method. I was talking about "full-blown" second order methods where you compute de hessian through AD.

Furthermore, I don't think by "moment of the gradients" they actually mean second derivatives.

Also from the paper: We introduce Adam, an algorithm for first-order gradient-based optimization ofstochastic objective functions...

It's written right in the abstract that the authors consider it a first-order method.

1 comments

Seems legit