Hacker News new | ask | show | jobs
by tomasff 1098 days ago
https://arxiv.org/abs/2207.09238

This is a great transparent and clear description of what is actually going on in attention layers.

Alternatively, this article by Jay Alammar is also very popular http://jalammar.github.io/illustrated-transformer/

1 comments

I commented yesterday, but came back to say this. Just found it, and it's really good. I like reading descriptions better than watching videos of them, so this is great.