| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sinenomine 2079 days ago
	One can use this to express a large part of a deep ML model as a reversible function, thus drastically lowering memory requirements for backpropagation. The same technique, but applied by hand was recently used to speed up training for transformer language model: https://arxiv.org/abs/2001.04451 I think in the future we will see a trend of expressing most of DL model as reversible computation, with minimal irreversible module in the end and in the beginning.