| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by p1esk 847 days ago
	Models can be sensitive to the location of a needle in the haystack of its input block. But only if we use some sort of attention optimization. For the quadratic attention algo it shouldn’t matter where the needle is, right?