| HN Mirror

You'd pay the cost of the core computation O(n) times. Matrix products under the derivative fibration (jet; whatever your algebra calls it) are just more matrix products. A good sized NN is already in the heap. Also, the hard part is finding the ideal combination of fwd vs rev transforms (it's NP hard). This is similar to the complexity of finding the ideal subblock matrix multiply orchestration.

So, the killer cost is at compile time, not runtime, which is fundamental to the underlying autograd operation.

On the flip side, it's 2025, not 2006, so pro modern algorithms & heuristics can change this story quite a bit.

All of this is spelled out in Griewank's work (the book).