Hacker News new | ask | show | jobs
by JunaidB 2360 days ago
This is a really important point and I wish I'd mentioned it. The computational considerations (as you've said) make the classic Gradient Descent method infeasible in practice. Therefore we resort to stochastic estimates or Quasi Newton approaches (which I'm still looking into).

My main objective was to highlight is that given that we are performing the classic gradient descent, the gradient will yield the greatest reduction in the function value. Essentially it was a point to highlight the underlying calculus. Wayne Winston in his book Operations Research: Applications and Algorithms has an interesting passage where he discusses the gradient being the direction of maximum increase (he was looking as steepest ascent).