|
|
|
|
|
by mkaic
1152 days ago
|
|
RWKV is an attention-free architecture that's showing promising scaling at a similar level to Transformers right now! There's also recently been Hyena, which uses a new mechanism that's kind of a weird mix of attention, convolution, and implicit modelling all at once. It's shown promise as well. Remains to be seen if these competing methods will truly scale as well as Transformers, but I've got my fingers crossed. Only a matter of time! I agree that "near future" is quite ambiguous though. If I were to disambiguate my claims, I think I'd personally expect a Transformer-killing architecture to arise in the next 4-5 years. |
|