| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jimmyl02 482 days ago
	the large context windows generally involve RoPE[0] which is a trick that allows the training window to be smaller but expand larger during inference. it seems like they have a new "iRoPE" which might have better performance? [0]https://arxiv.org/pdf/2104.09864