| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chaitjo 2285 days ago
	Indeed, the two papers came out within months of each other iirc. The GAT paper discusses Transformers in the context of stabilizing the learning of attention mechanisms. Of course, this connection may be trivial to most people, but I hadn't seen a post on this before. So I decided to write one for myself as I studied these architectures.