|
|
|
|
|
by dsubburam
1063 days ago
|
|
'Formal Algorithms for Transformers'[1] is a proper account of the architectures and what tasks they naturally lend themselves to, by authors from DeepMind. See sections 3 (Transformers and Typical Tasks) and 6 (Transformer Architectures). Not much on empirical observations, though. [1]https://arxiv.org/abs/2207.09238 |
|