It tends to take a lot of RAM, but the speedups possible are 3-5 orders of magnitude in the projects I've studied.
Here's a parallel C++ library to do so that requires relatively minor changes to the original algorithm. [1] [2]
I also believe it would be possible to develop a compiler framework to do the transformation mostly automatically, using an MLIR dialect.
[0] https://en.wikipedia.org/wiki/Incremental_computing
[1] https://dl.acm.org/doi/pdf/10.1145/3409964.3461799
[2] Source code: https://github.com/cmuparlay/psac