| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by deepnotderp 3500 days ago
	Vanishing gradient isn't the same as memory efficiency. The memory mirror option is what allows this extremely efficient memory usage by only being 30% more compute intensive.

1 comments

Yes, but that's not what I asked about.

Vanishing gradient is solved using model architectural choices: ReLu activation instead of sigmoid or tanh, using batch-normalization, using LSTMs

These are orthogonal to memory management and neural net framework choices.