Hacker News new | ask | show | jobs
by nl 1251 days ago
Read about "RNN Vanishing Gradients". LSTMs help here, but see eg https://medium.com/analytics-vidhya/why-are-lstms-struggling... for the problems there.
1 comments

My understanding that LSTM is a kind of RNN.
Yes it is. They were developed to fix the vanishing gradient problem.

The 1997 paper where they were introduced puts it like this:

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

https://www.researchgate.net/publication/13853244_Long_Short...

Usually they aren't competitive with transformers on long-range understanding problems though.