Hacker News new | ask | show | jobs
by biofox 486 days ago
Isn't that all of modern AI?
1 comments

Transformers are completely unlike RNNs.
There are some interesting connections between them. If you remove the softmax from the attention formula, you end up with linear attention, which has a recurrent form.

I haven't read it, but the Mamba 2 paper claims to establish a stronger connection.

* If you remove the softmax from the attention formula, you end up with linear attention*

Sorry, what?

Here is a paper explaining it: https://arxiv.org/abs/2006.16236