| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by psaccounts 205 days ago

I published a video that explains Self-Attention and Multi-head attention in a different way -- going from intuition, to math, to code starting from the end-result and walking backward to the actual method.

Hopefully this sheds light on this important topic in a way that is different than other approaches and provides the clarity needed to understand Transformer architecture. It starts at 41:22 in the below video.

https://youtu.be/6jyL6NB3_LI?t=2482