Hacker News new | ask | show | jobs
by sinenomine 2079 days ago
One can use this to express a large part of a deep ML model as a reversible function, thus drastically lowering memory requirements for backpropagation. The same technique, but applied by hand was recently used to speed up training for transformer language model: https://arxiv.org/abs/2001.04451

I think in the future we will see a trend of expressing most of DL model as reversible computation, with minimal irreversible module in the end and in the beginning.