You can imagine the classic attention mechanism as a lookup table, actually.
Transformers are layers and layers and layers of lookup tables.