|
|
|
|
|
by QuadmasterXLII
810 days ago
|
|
The tokens are the structure over which the attention mechanism is permutation equivariant. This structure permeates the forward pass, its important at every layer and will be until we find something better than attention. |
|