Hacker News new | ask | show | jobs
by slashcom 2644 days ago
And replaced by residual connections in transformers, which are absolutely dominating LSTMs now.
1 comments

Transformer-XL uses recurrence, and most NLP SOTA is still with LSTMs. I’m not sure I’d expect attention mechanisms to fully replace recurrence.