| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tshadley 384 days ago
	The goal here is not to replace transformers but combine them with RNN so you get both good short-term memory (self-attention) and much improved long-term memory (ATLAS recurrent memory). "Empirically, our models—OmegaNet, Atlas, DeepTransformers, and Dot—achieve consistent improvements over Transformers and recent RNN variants across diverse benchmarks."