|
|
|
|
|
by psaccounts
159 days ago
|
|
I published a video that explains Self-Attention and Multi-head attention in a different way -- going from intuition, to math, to code starting from the end-result and walking backward to the actual method. Hopefully this sheds light on this important topic in a way that is different than other approaches and provides the clarity needed to understand Transformer architecture. It starts at 41:22 in the below video. https://youtu.be/6jyL6NB3_LI?t=2482 |
|