|
|
|
|
|
by necroforest
1120 days ago
|
|
To answer (2): You are token i. In order to see how much of a token j's value v_j you update yourself with, you compare your query q_i with token j's key k_j. This gives you the asymmetry between queries and keys. This is even more apparent in a cross-attention setting, where one stream of tokens will have only queries associated with it and the other will have only keys/values. |
|