|
|
|
|
|
by sinenomine
2079 days ago
|
|
One can use this to express a large part of a deep ML model as a reversible function, thus drastically lowering memory requirements for backpropagation. The same technique, but applied by hand was recently used to speed up training for transformer language model: https://arxiv.org/abs/2001.04451 I think in the future we will see a trend of expressing most of DL model as reversible computation, with minimal irreversible module in the end and in the beginning. |
|