Gated Attention for Large Language Models

Y	Hacker News new \| ask \| show \| jobs

	Gated Attention for Large Language Models (arxiv.org)
	1 points by xnhbx 244 days ago