Hacker News new | ask | show | jobs
by infogulch 1528 days ago
The most interesting thing I've seen on AD is "The simple essence of automatic differentiation" (2018) [1]. See past discussion [2], and talk [3]. I think the main idea is that by compiling to categories and pairing up a function with its derivative, the pair becomes trivially composable in forward mode, and the whole structure is easily converted to reverse mode afterwards.

[1]: https://dl.acm.org/doi/10.1145/3236765

[2]: https://news.ycombinator.com/item?id=18306860

[3]: Talk at Microsoft Research: https://www.youtube.com/watch?v=ne99laPUxN4 Other presentations listed here: https://github.com/conal/essence-of-ad

1 comments

> the whole structure is easily converted to reverse mode afterwards.

Unfortunately it's not. Elliot never actually demonstrates in the paper how to implement such an algorithm, and it's very hard to write compiler transformations in "categorical form".

(Disclosure: I'm the other of another paper on AD.)

I think JAX effectively demonstrates that this is indeed possible. The approach they use is to first linearise the JAXPR and then transpose it, pretty much in the same fashion as the Elliot paper did.
A JAXPR is a normal functional-style expression tree, with free variables, like

    let b = f a in g (a, b)
Elliot's "compiling to categories" requires you to translate this to

    g ∘ (id × f) ∘ dup
It's pretty baffling to work with terms like the latter in practice! The main ideas that JAX is based on were already published in "Lambda the ultimate backpropagator" in 2008. Elliot's work is a nice way of conceptualising the AD transformations and understanding how they all relate to each other, but it's not particularly practical.
which paper?