Hacker News new | ask | show | jobs
by tfgg 3560 days ago
It sounds like Tegmark is pointing out a pretty obvious and deliberately designed property of LSTMs... the entire point of them is to avoid exponentially decaying / exploding gradients and allow propagation of information over longer time-scales.