|
|
|
|
|
by hansvm
514 days ago
|
|
EA doesn't quite fit in the same umbrella. EA has a constant cache size (it's just another classical recurrent architecture inspired by approximating transformers), where this paper gives speedups to a variety of true attention mechanisms which still require caches to be proportional to the sequence length. |
|