Hacker News new | ask | show | jobs
by solomatov 1252 days ago
My understanding is that RNNs aren't worse than Transformers per se, they are just slower to train, and use GPU much more efficiently, i.e. much more stuff could be run in parallel.
2 comments

Also slower to perform inference on. RNNs have to be much more sequential.
We also don't have evidence that they scale the way transformers do