Hacker News new | ask | show | jobs
by chestervonwinch 3252 days ago
In short, Newton's method uses second order derivative information in the search direction, while gradient descent only uses first order derivative. In between, there are "quasi-newton" methods which include generalizations of the "secant method". I should also mention that there are all sorts of ad-hoc approaches for attempting to increase the convergence rate of gradient descent, e.g., pre-conditioning, "momentum" terms, etc.