| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hansvm 514 days ago
	EA doesn't quite fit in the same umbrella. EA has a constant cache size (it's just another classical recurrent architecture inspired by approximating transformers), where this paper gives speedups to a variety of true attention mechanisms which still require caches to be proportional to the sequence length.

1 comments

very succinct and insightful, thank you!