Hacker News new | ask | show | jobs
by njohnson41 4099 days ago
I also like how the backpropagation section starts out by immediately talking about how it is really just chain rule application.

The backwards-moving pattern of "backpropagation" is really just a side-effect of the derivative chain rule application order, but a lot of intro materials treat backprop as if it is some fancy thing specially-designed for neural nets. I suppose "compute the gradient of this function using basic vector calculus" just isn't sexy enough. I complain mostly because it took me a while to figure out whether backprop was exactly the same as gradient descent, or if there were subtle differences.

1 comments

Another interesting point--that chain-rule gradient evaluation is essentially something called automatic differentiation:

https://en.wikipedia.org/wiki/Automatic_differentiation

which is really cool stuff and should be included more often when talking about backprop.