| HN Mirror

But the chain rule for ordered derivatives is exactly the backprop rule. It's just the mathematical representation of 'the simple implementation' I mentioned.

I think what you're saying is that you find the process intuitive. I don't have much of a way to argue with that. But I think it's important to note that we're dealing with two things: 1. a process that we follow (backprop), 2. a true answer that is obtainable using only the chain rule. And yes it turns out that (1) and (2) both give the same answer. But (2) requires much more work, and I question anyone who claims that (1) is 'obvious' from (2): getting (1) from (2) requires work.

I'm guessing you'll agree that using only the chain rule takes much more work, but in case you don't: consider a fully connected graph with at least 5 variables, say a = 5; b = 2 a; c = 2 a b; d = 2 a b c; e = 2 a b c d. If you use backprop, you can compute de/da rapidly. If you use only the chain rule, it will take a long time to compute de/da, because the number of terms you have to deal with increases exponentially fast with the number of variables.