Hacker News new | ask | show | jobs
by shawntan 805 days ago
Not sure if this is the type of answer you're looking for, but RWKV is not really recurrent the same way RNNs are recurrent. This quasi-recurrentness allows it and its comrades to use algorithms like parallel SCAN to achieve log N complexity when parallelised. But you pay for that in terms of state-tracking.

There's a cool talk here if you care to know the details:https://www.youtube.com/watch?v=4-VXe1yPDjk