Hacker News new | ask | show | jobs
by tysam_and 1000 days ago
It's an RNN, there is no N^2 component over time.

It only requires the previous state.

(there's a discord, you should join it with further questions! I unfortunately am not as informed as I should be on this one, other than the fact that it is _very_ mobile friendly). The performance diff is slight but not too bad really, all things considered. And I think it comes out on top for raw efficiency per parameter/flop, IIRC.

An interesting concept, for sure! :'DDDD :'))))

1 comments

Sigh. Do discussions about RWKV always end with suggestions that I join the Discord? If I do join the Discord, will I soon begin suggesting that others join the Discord as well? What I mean is, I've seen this come up a few times on HN and discussions usually end prematurely with suggestions to join the Discord. [0]

If this technique is good, I'll wait until I can learn about it without joining the Discord.

[0]: https://news.ycombinator.com/item?id=35508692