Hacker News new | ask | show | jobs
by tiarkrompf 2733 days ago
It's different -- instead of returning a closure, we take a closure as additional parameter (check paper for details). This means that the call stack stays intact and all intermediate values can be stack-allocated.
2 comments

Could you please elaborate on what exactly enables stack allocation of intermediate values?

Is it a direct result of the semantics of delimited continuations? (perhaps because closures are passed as parameters and don't capture variables)

Or is it enabled via staging or optimizations?

It's specific to CPS form: every basic statement is a full function call, which moves to the next statement by calling another function (i.e. the continuation). Normally this would just be a very quick way to get yourself a stack overflow, so typical CPS-form compilers optimise (or require) the tail-call case, where you can pop the stack frame for the last instruction before moving on to the next. But for AD that's not a big deal, you just re-use those values in the backwards pass.

I doubt it's better than just allocating your own stack on the heap, especially when you have something like a differential equation solver with millions of statements, but it's still a neat trick.

Aha, that makes sense, thanks!

I agree that the performance benefits of such stack allocation (over heap allocation) aren't quite clear in practice.

I feel the bigger win of delimited cont./closure-based AD approaches is that they can model the control flow of reverse-mode AD without AD-specific code transformations. Delimited cont. is especially great at making things modular: each differentiable function performs primal computation, calls the callback with primal result, then performs adjoint computation.

OK, thanks, sounds worth trying to wrap my head around...

(Note BTW that the github page has a dead link to the paper.)

Thanks! Fixed now.