Hacker News new | ask | show | jobs
by odnes 2125 days ago
These 4 videos (~45 mins) do an excellent job at explaining attention, multi-headed attention, and transformers: https://www.youtube.com/watch?v=yGTUuEx3GkA