|
|
|
|
|
by imtringued
244 days ago
|
|
Only being able to run transformers is a silly concept, because attention consists of two matrix multiplications, which are the standard operation in feed forward and convolutional layers. Basically, you get transformers for free. |
|