| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by macrolime 807 days ago
	"This is more computationally efficient than performing a full content-based lookup across an entire memory buffer for each step in the future, and could be one step towards drastically increasing the context-length available for making a prediction." Is this how they get a context window of 10 million tokens? Or are they refering to even longer context windows in the future?