|
|
|
|
|
by immibis
710 days ago
|
|
Transformers reduce the number of relationships between tokens that must be learned, too. An MLP has to separately learn all possible relationships between token 1 and 2, and 2 and 3, and 3 and 4. A transformer can learn relationships between specific values regardless of position. |
|