Hacker News new | ask | show | jobs
by mykeliu 3217 days ago
I'm a novice when it comes to neural network models, but would I be correct in interpreting this as a convolutional network architecture with multiple stacked encoders and decoders?
1 comments

Nah, the paper explicitly states that their system is not recurrent nor convolutional:

> To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using RNNs or convolution.