Hacker News new | ask | show | jobs
by vidarh 3814 days ago
The first alternative builds a large tree structure, and then evaluates the whole tree structure afterwards.

So first it blows up the size of the expression to process and then it calculates the result. A lot of those calculations will be redundant

The second one not only avoids evaluating the tree separately, but "prunes" a lot of the evaluation automatically by effectively short-circuiting the whole process. Consider (with the caveat that my Haskell understanding is rudimentary at best and it was about 20 years since I last did anything involving symbolic differentiation) :

   Product e e' -> Product e (diff e') `Sum` Product (diff e) e'
Followed by a separate run with:

    Product e e' -> ev e * ev e'
vs

    Product e e' -> let (ex, ed)   = ev e
                            (ex', ed')       = ev e'
                    in (ex * ex', ex * ed' + ed * ex')
(I pick the "Product" rule as an example because it is one of the ones that blows up the size of the tree)

Lets say you do something simple like Product X X. You get Sum (Product X One) (Product One X) out, and then you have to evaluate each node.

In the second case, you match Product e e'. You process X and assign (x,1) to (ex,ed), and process the second X and assign (x,1) to (ex', ed'), and then return (ex * ex', ex * ed' + ed * ex').

In the first case, you've first differentiated 3 nodes (Product + 2x "X"), then evaluated the 7 nodes that was produced as output, for a total of ten nodes processed.

In the second you've evaluated/differentatiated 3 nodes in one go without the intermediate step of having to evaluate a much larger tree.

In a large example, the number of nodes in the differentiated output quickly explodes and subsequent evaluation would increase rapidly in cost.

2 comments

Memoizing the symbolic differentiator ought to eliminate the blowup, as I commented in the other thread the other day.
From an asymptotic complexity viewpoint, I don't see any difference between the two algorithms (AD versus building an expression tree and doing it symbolically, then evaluating). Both are linear in the "size" of the expression. So I don't understand what you mean by "quickly explodes".
See here:

http://h2.jaguarpaw.co.uk/posts/why-is-naive-symbolic-differ...

In summary, yes both functions are linear, but the size of the symbolic derivative is quadratic.

The result of the symbolic derivation grows faster than linearly, as you can see from the Product rule for example.

As a result, while the evaluation is linear with input tree size, the tree it is evaluating is typically much larger than the input expression.