|
|
|
|
|
by thentherewere2
905 days ago
|
|
RNNs and LSTMs from the past did this as well (but cannot be trained in parallel as each token has to be compressed sequentially). Transformers ate their cake. Newer methods are going back to similar concepts but trying to get past previous bottlenecks given what we've learned since then about transformers. |
|