| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Eisenstein 371 days ago
	I don't know, I think that extending context windows is actually detrimental because people assume they can just dump things in there until it fills up. You still have to deal with the limited attention that the models have, and only filling the context with things relevant to the particular thing you are trying to solve is going to be the most effective approach. If you have too much information for it to fit into a 128K window, I think you just have too much information. The entirety of Don Quixote at over 1000 pages is less than 64,000 tokens.

1 comments

CamperBob2 371 days ago

That sounds low by about 10x, assuming Don Quixote has 430k words (per Google).

Still, yes, I don't know of a single model that doesn't go off the rails if you actually try to take advantage of its context length specification.

link

Eisenstein 370 days ago

Well, I loaded up Llama 3 and downloaded the novel, and for the English translation we get 545997 tokens and in the original Spanish 653981 tokens. So when I estimated it did lose a an order of magnitude. Thanks for the correction.

link