| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by njohnson41 4146 days ago
	I also like how the backpropagation section starts out by immediately talking about how it is really just chain rule application. The backwards-moving pattern of "backpropagation" is really just a side-effect of the derivative chain rule application order, but a lot of intro materials treat backprop as if it is some fancy thing specially-designed for neural nets. I suppose "compute the gradient of this function using basic vector calculus" just isn't sexy enough. I complain mostly because it took me a while to figure out whether backprop was exactly the same as gradient descent, or if there were subtle differences.

1 comments

cafebeen 4146 days ago

Another interesting point--that chain-rule gradient evaluation is essentially something called automatic differentiation:

https://en.wikipedia.org/wiki/Automatic_differentiation

which is really cool stuff and should be included more often when talking about backprop.