|
|
|
|
|
by ActorNightly
175 days ago
|
|
>A single convolution step is a local operation (only pulling from nearby pixels), whereas attention is a "global" operation. In the same way where the learned weights to generate K,Q,V matricies may have zeros (or small values) for referencing certain tokens, convolution kernels just have defined zeros. |
|