| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by npsomaratna 1056 days ago
	My understanding is that in NTK aware RoPE scaling, the model does pay uniform attention. With older methods, not as much.