There are some interesting connections between them. If you remove the softmax from the attention formula, you end up with linear attention, which has a recurrent form.
I haven't read it, but the Mamba 2 paper claims to establish a stronger connection.
I haven't read it, but the Mamba 2 paper claims to establish a stronger connection.