| Anyone who believes that this completes their understanding of automatic differentiation is tricking themselves. When your graph is a TREE, then everything is very simple, as in this post. When your graph is instead a more general directed acyclic graph (e.g., x = 5; y = 2x; z = xy), then the IMPLEMENTATION is still very simple, but understanding WHY that implementation works is not as simple (repeat: if you think it’s ‘just the ordinary chain rule’, you are tricking yourself). One of the earliest descriptions of this was by Paul Werbos. He called the required rule “the chain rule for ordered derivatives”, which he proved by induction from the ordinary chain rule. But it is nevertheless not immediately evident from the ordinary chain rule. I welcome anyone who believes otherwise to prove me wrong. If you do I will be very happy. |