Hacker News new | ask | show | jobs
by LoganDark 1071 days ago
> RWKV

The current versions of RWKV slowly go insane when exposed to sequences that are too long, because the state slowly diverges over time as you increase past the context length of the training session. They are experimenting with ways to avoid this though: https://github.com/Blealtan/RWKV-LM-LoRA/tree/dev-infctx

1 comments

Can you share more details about the divergence?