Hacker News new | ask | show | jobs
by logicchains 1144 days ago
Nope, not yet, the current 14B version is much worse than LLaMA 65B. But there are apparently plans to train a RWKV-65B by the end of the year, and if including the LLaMA training dataset results in something like LLaMA-65B but with infinite context then that'd be really amazing.