|
|
|
|
|
by tripplyons
485 days ago
|
|
There are some interesting connections between them. If you remove the softmax from the attention formula, you end up with linear attention, which has a recurrent form. I haven't read it, but the Mamba 2 paper claims to establish a stronger connection. |
|
Sorry, what?