Hacker News new | ask | show | jobs
by ottaborra 484 days ago
RNN with extra steps?
2 comments

There are many papers that use a recurrence across sub-sequences and attention within sub-sequences. Google did this with Infini-Attention and one of the variants from the Titans paper. However, I think the earliest example of this is Transformer-XL.
Isn't that all of modern AI?
Transformers are completely unlike RNNs.
There are some interesting connections between them. If you remove the softmax from the attention formula, you end up with linear attention, which has a recurrent form.

I haven't read it, but the Mamba 2 paper claims to establish a stronger connection.

* If you remove the softmax from the attention formula, you end up with linear attention*

Sorry, what?

Here is a paper explaining it: https://arxiv.org/abs/2006.16236