| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 1494 days ago
	> Wow. Transformers are very limited in the size of the attention window. They can take a few thousand tokens at maximum. But your data might not fit into the window, and you also don't want to have to fine-tune the model. This paper offers a solution.