| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by LoganDark 1119 days ago
	> RWKV The current versions of RWKV slowly go insane when exposed to sequences that are too long, because the state slowly diverges over time as you increase past the context length of the training session. They are experimenting with ways to avoid this though: https://github.com/Blealtan/RWKV-LM-LoRA/tree/dev-infctx

1 comments

Can you share more details about the divergence?