Hacker News new | ask | show | jobs
by NoobSaibot135 941 days ago
Here’s another awesome Karpathy lecture from Stanford:

https://youtu.be/XfpMkf4rD6E?si=1_EmuYDFfi7RNEhz

This video is the best for learning attention, specifically where he explains:

Think of attention like a directed graph of vectors passing messages to each other

Keys are what other tokens are communicating to you,

Queries are what you are interested in,

and Values are what you are projecting out yourself.

When you matrix multiply the queries x keys(transposed), you measure the interestingness or affinity between the two.