| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mnicky 109 days ago
	This observation makes sense, because all models currently probably use some kind of a sparse attention architecture. So the closer the two related pieces of information are to each other in the input context, the larger the chance their relationship will be preserved.