| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by immibis 531 days ago
	Transformers are completely unlike RNNs.

1 comments

tripplyons 531 days ago

There are some interesting connections between them. If you remove the softmax from the attention formula, you end up with linear attention, which has a recurrent form.

I haven't read it, but the Mamba 2 paper claims to establish a stronger connection.

link

kadushka 531 days ago

* If you remove the softmax from the attention formula, you end up with linear attention*

Sorry, what?

link

tripplyons 531 days ago

Here is a paper explaining it: https://arxiv.org/abs/2006.16236

link