| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by BorisTheBrave 1120 days ago
	I think you've misunderstood. Megabyte scales better with context window length. I don't know if they're saying the training data / compute are any more efficient.