Y
Hacker News
new
|
ask
|
show
|
jobs
by
deepnotderp
3500 days ago
Vanishing gradient isn't the same as memory efficiency. The memory mirror option is what allows this extremely efficient memory usage by only being 30% more compute intensive.
1 comments
bsfjgngdnxy
3500 days ago
Yes, but that's not what I asked about.
link
alexbeloi
3500 days ago
Vanishing gradient is solved using model architectural choices: ReLu activation instead of sigmoid or tanh, using batch-normalization, using LSTMs
These are orthogonal to memory management and neural net framework choices.
link