Y
Hacker News
new
|
ask
|
show
|
jobs
by
whimsicalism
806 days ago
transformers were also just better at the LM task than 2018 RNNs for equal amount of flop training
1 comments
VHRanger
806 days ago
Yeah, that's just the training stability part to my knowledge
link
whimsicalism
806 days ago
they're also just less capable models. like just adding attention on top of an RNN made them a lot better
link
SpaceManNabs
806 days ago
Calculating self-attention is still quadratic though. So you get the negatives of transformers there too.
link