| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cgel 230 days ago
	We have trained a completely attention-free LLM whose performance is competitive with state-of-the-art models. This model, which we call Brumby-14B-Base, has a familiar Transformer-style architecture, except it uses power retention layers instead of attention layers. It is available on Huggingface.