|
|
|
|
|
by vore
1232 days ago
|
|
That's not what a transformer model is: a transformer model is just one that uses self-attention blocks in its layers to encode contextual information about the input. A non-transformer model can equally translate from one representation to another: e.g. before transformer models a commonly used architecture for seq2seq models were RNNs. |
|