|
|
|
|
|
by shawntan
536 days ago
|
|
Although marketed as such, RWKV isn't really an RNN. In the recent RWKV7 incarnation, you could argue it's a type of Linear RNN, but past versions had an issue of taking its previous state from a lower layer, allowing for parallelism, but makes it closer to a convolution than a recurrent computation. As for 1), I'd like to believe so, but it's hard to get people away from the addictive drug that is the easily parallelised transformer, 2) (actual) RNNs and attention mechanisms to me seem fairly powerful (expressivity wise) and perhaps most acceptable by the community. |
|
It's possible we start seeing more blended version of RNN/attention architecture exploring different LLM properties.
In particular, Aaren architecture in the former paper "can not only (i) be trained in parallel (like Transformers) but also (ii) be updated efficiently with new tokens, requiring only constant memory for inferences (like traditional RNNs)."