Hacker News new | ask | show | jobs
by hansvm 514 days ago
EA doesn't quite fit in the same umbrella. EA has a constant cache size (it's just another classical recurrent architecture inspired by approximating transformers), where this paper gives speedups to a variety of true attention mechanisms which still require caches to be proportional to the sequence length.
1 comments

very succinct and insightful, thank you!