|
|
|
|
|
by tshadley
384 days ago
|
|
The goal here is not to replace transformers but combine them with RNN so you get both good short-term memory (self-attention) and much improved long-term memory (ATLAS recurrent memory). "Empirically, our models—OmegaNet, Atlas, DeepTransformers,
and Dot—achieve consistent improvements over Transformers and recent RNN variants across diverse benchmarks." |
|