Hacker News new | ask | show | jobs
by bsfjgngdnxy 3500 days ago
Yes, but that's not what I asked about.
1 comments

Vanishing gradient is solved using model architectural choices: ReLu activation instead of sigmoid or tanh, using batch-normalization, using LSTMs

These are orthogonal to memory management and neural net framework choices.