|
|
|
|
|
by chaitjo
2285 days ago
|
|
Indeed, the two papers came out within months of each other iirc. The GAT paper discusses Transformers in the context of stabilizing the learning of attention mechanisms. Of course, this connection may be trivial to most people, but I hadn't seen a post on this before. So I decided to write one for myself as I studied these architectures. |
|