Hacker News new | ask | show | jobs
by enriquto 895 days ago
For a dryer, more formal and succinct approach, see "The Transformer Model in Equations" [0], by John Thickstun. The whole thing fits in a single page, using standard mathematical notation.

[0] https://johnthickstun.com/docs/transformers.pdf

2 comments

Finally, thank you so much! Was it so difficult? Isn't 7 lines of mathematical notation way better than pages of qualitative pub talking? I don't really understand these ML researchers, it always looks like they have never studied mathematics at all.
Thank god, I've had to cobble something like this together for my own notes a couple of times trying to parse papers and was never quite sure if I was missing something.