This is a great transparent and clear description of what is actually going on in attention layers.
Alternatively, this article by Jay Alammar is also very popular http://jalammar.github.io/illustrated-transformer/