Hacker News new | ask | show | jobs
by nl 1248 days ago
Yes it is. They were developed to fix the vanishing gradient problem.

The 1997 paper where they were introduced puts it like this:

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

https://www.researchgate.net/publication/13853244_Long_Short...

Usually they aren't competitive with transformers on long-range understanding problems though.