Hacker News new | ask | show | jobs
by chaitjo 2285 days ago
Indeed, the two papers came out within months of each other iirc. The GAT paper discusses Transformers in the context of stabilizing the learning of attention mechanisms.

Of course, this connection may be trivial to most people, but I hadn't seen a post on this before. So I decided to write one for myself as I studied these architectures.