|
|
|
|
|
by delluminatus
3809 days ago
|
|
Great post, as an AD tutorial and as a (an?) Haskell exercise. Having known nothing about AD before, I feel like I have a good understanding of what it is -- as he says, it's so simple -- but I don't understand why the algorithm is so much faster. Just looking at the differentiator function and the AD function, it actually appears that the AD should take longer because it does more computation per step (both the function and the derivative). But it seems like every article or paper is talking about how to implement AD, not why the algorithm is so efficient. Does anyone happen to know of a good article or paper about that? Ideally, one just as nice and comprehensible as this! |
|
So first it blows up the size of the expression to process and then it calculates the result. A lot of those calculations will be redundant
The second one not only avoids evaluating the tree separately, but "prunes" a lot of the evaluation automatically by effectively short-circuiting the whole process. Consider (with the caveat that my Haskell understanding is rudimentary at best and it was about 20 years since I last did anything involving symbolic differentiation) :
Followed by a separate run with: vs (I pick the "Product" rule as an example because it is one of the ones that blows up the size of the tree)Lets say you do something simple like Product X X. You get Sum (Product X One) (Product One X) out, and then you have to evaluate each node.
In the second case, you match Product e e'. You process X and assign (x,1) to (ex,ed), and process the second X and assign (x,1) to (ex', ed'), and then return (ex * ex', ex * ed' + ed * ex').
In the first case, you've first differentiated 3 nodes (Product + 2x "X"), then evaluated the 7 nodes that was produced as output, for a total of ten nodes processed.
In the second you've evaluated/differentatiated 3 nodes in one go without the intermediate step of having to evaluate a much larger tree.
In a large example, the number of nodes in the differentiated output quickly explodes and subsequent evaluation would increase rapidly in cost.