Hacker News new | ask | show | jobs
by pacmansyyu 3271 days ago
Here[1] is an article describing the same, written by the author himself.

[1]: http://ruder.io/optimizing-gradient-descent/index.html