| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by delluminatus 3856 days ago
	Great post, as an AD tutorial and as a (an?) Haskell exercise. Having known nothing about AD before, I feel like I have a good understanding of what it is -- as he says, it's so simple -- but I don't understand why the algorithm is so much faster. Just looking at the differentiator function and the AD function, it actually appears that the AD should take longer because it does more computation per step (both the function and the derivative). But it seems like every article or paper is talking about how to implement AD, not why the algorithm is so efficient. Does anyone happen to know of a good article or paper about that? Ideally, one just as nice and comprehensible as this!

4 comments

vidarh 3856 days ago

The first alternative builds a large tree structure, and then evaluates the whole tree structure afterwards.

So first it blows up the size of the expression to process and then it calculates the result. A lot of those calculations will be redundant

The second one not only avoids evaluating the tree separately, but "prunes" a lot of the evaluation automatically by effectively short-circuiting the whole process. Consider (with the caveat that my Haskell understanding is rudimentary at best and it was about 20 years since I last did anything involving symbolic differentiation) :

   Product e e' -> Product e (diff e') `Sum` Product (diff e) e'

Followed by a separate run with:

    Product e e' -> ev e * ev e'

    Product e e' -> let (ex, ed)   = ev e
                            (ex', ed')       = ev e'
                    in (ex * ex', ex * ed' + ed * ex')

(I pick the "Product" rule as an example because it is one of the ones that blows up the size of the tree)

Lets say you do something simple like Product X X. You get Sum (Product X One) (Product One X) out, and then you have to evaluate each node.

In the second case, you match Product e e'. You process X and assign (x,1) to (ex,ed), and process the second X and assign (x,1) to (ex', ed'), and then return (ex * ex', ex * ed' + ed * ex').

In the first case, you've first differentiated 3 nodes (Product + 2x "X"), then evaluated the 7 nodes that was produced as output, for a total of ten nodes processed.

In the second you've evaluated/differentatiated 3 nodes in one go without the intermediate step of having to evaluate a much larger tree.

In a large example, the number of nodes in the differentiated output quickly explodes and subsequent evaluation would increase rapidly in cost.

link

abecedarius 3856 days ago

Memoizing the symbolic differentiator ought to eliminate the blowup, as I commented in the other thread the other day.

link

amelius 3856 days ago

From an asymptotic complexity viewpoint, I don't see any difference between the two algorithms (AD versus building an expression tree and doing it symbolically, then evaluating). Both are linear in the "size" of the expression. So I don't understand what you mean by "quickly explodes".

link

tome 3855 days ago

See here:

http://h2.jaguarpaw.co.uk/posts/why-is-naive-symbolic-differ...

In summary, yes both functions are linear, but the size of the symbolic derivative is quadratic.

link

vidarh 3853 days ago

The result of the symbolic derivation grows faster than linearly, as you can see from the Product rule for example.

As a result, while the evaluation is linear with input tree size, the tree it is evaluating is typically much larger than the input expression.

link

pjscott 3856 days ago

The clearest article I've seen on the subject is this one:

https://colah.github.io/posts/2015-08-Backprop/

link

delish 3855 days ago

> as a (an?) Haskell exercise

In american english, this would be "a Haskell exercise," pronouncing "a" to rhyme with "hey" or "ah." In british english, it would be "an 'askell exercise."

I'm american, though. British people please weigh in.

link

tome 3856 days ago

> I don't understand why the algorithm is so much faster

I hope this helps!

http://h2.jaguarpaw.co.uk/posts/why-is-naive-symbolic-differ...

link