| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by logicchains 1144 days ago
	Nope, not yet, the current 14B version is much worse than LLaMA 65B. But there are apparently plans to train a RWKV-65B by the end of the year, and if including the LLaMA training dataset results in something like LLaMA-65B but with infinite context then that'd be really amazing.