Hacker News new | ask | show | jobs
by nl 1247 days ago
For those wondering how on earth they are getting decent results from a RNN without long range forgetting, I don't really know either!

But they reference https://arxiv.org/abs/2105.14103 and the bottom section of https://github.com/BlinkDL/RWKV-LM has an explainer.