| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by FallDead 1166 days ago
	one of the authors here!, I think someone in our discord did experiments to prove that it does work for longer contexts, The pace of this work moves really fast. This might have been an earlier models in the series. RWKV it needs to be trained for longer contexts lengths in order to obtain that skill a context tuning if you will. IRCC there will be a follow up paper for it.

1 comments

Thank you for taking the time to comment here!

That sounds promising. Maybe scale (of model and training samples) is all you need.

And RNNs are obviously so much more efficient at inference.

I'm going to take a closer look :-)