|
|
|
|
|
by Buttons840
1167 days ago
|
|
I've been reading this paper with pseudocode for various transformers and finding it helfpul: https://arxiv.org/abs/2207.09238 "This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (not results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models." |
|