Hacker News new | ask | show | jobs
by necroforest 1120 days ago
To answer (2): You are token i. In order to see how much of a token j's value v_j you update yourself with, you compare your query q_i with token j's key k_j. This gives you the asymmetry between queries and keys.

This is even more apparent in a cross-attention setting, where one stream of tokens will have only queries associated with it and the other will have only keys/values.