| I'm sure there's a lot of good material around, but here are some links that are conceptually very close to the linked Autodidax. (Disclaimer: I wrote Autodidax and some of these other materials.) There's Autodidact [0], a predecessor to Autodidax, which was a simplified implementation of the original Autograd [1]. It focuses on reverse-mode autodiff, not building an open-ended transformation system like Autodidax. It's also pretty close to the content in these lecture slides [2] and this talk [3]. But the autodiff in Autodidax is more sophisticated and reflects clearer thinking. In particular, Autodidax shows how to implement forward- and reverse-modes using only one set of linearization rules (like in [4]). There's an even smaller and more recent variant [5], a single ~100 line file for reverse-mode AD on top of NumPy, which was live-coded during a lecture. There's no explanatory material to go with it though. [0] https://github.com/mattjj/autodidact [1] https://github.com/hips/autograd [2] https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slid... [3] http://videolectures.net/deeplearning2017_johnson_automatic_... [4] https://arxiv.org/abs/2204.10923 [5] https://gist.github.com/mattjj/52914908ac22d9ad57b76b685d19a... |
I used this code as inspiration for a functional-only (without references/pointers) in Mercury: https://github.com/mclements/mercury-ad