Hacker News new | ask | show | jobs
by jeremycochoy 868 days ago
I believe RWKV is actually an architecture that can be used for encoding: given a LSTM/GRU, you can simply take the last state as an encoding of your sequence. The same should be possible with RWKV, right?