| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ottaborra 531 days ago
	RNN with extra steps?

2 comments

tripplyons 531 days ago

There are many papers that use a recurrence across sub-sequences and attention within sub-sequences. Google did this with Infini-Attention and one of the variants from the Titans paper. However, I think the earliest example of this is Transformer-XL.

link

biofox 531 days ago

Isn't that all of modern AI?

link

immibis 531 days ago

Transformers are completely unlike RNNs.

link

tripplyons 531 days ago

There are some interesting connections between them. If you remove the softmax from the attention formula, you end up with linear attention, which has a recurrent form.

I haven't read it, but the Mamba 2 paper claims to establish a stronger connection.

link

kadushka 531 days ago

* If you remove the softmax from the attention formula, you end up with linear attention*

Sorry, what?

link

tripplyons 531 days ago

Here is a paper explaining it: https://arxiv.org/abs/2006.16236

link