Hacker News new | ask | show | jobs
by naasking 1210 days ago
So maybe RWKV [1] is the next step. It parallelizes even better and seems to have no sequence limit.

[1] https://github.com/BlinkDL/RWKV-LM