Hacker News new | ask | show | jobs
by dsubburam 1063 days ago
'Formal Algorithms for Transformers'[1] is a proper account of the architectures and what tasks they naturally lend themselves to, by authors from DeepMind. See sections 3 (Transformers and Typical Tasks) and 6 (Transformer Architectures).

Not much on empirical observations, though.

[1]https://arxiv.org/abs/2207.09238

1 comments

ty!