|
|
|
|
|
by Eisenstein
323 days ago
|
|
I don't know, I think that extending context windows is actually detrimental because people assume they can just dump things in there until it fills up. You still have to deal with the limited attention that the models have, and only filling the context with things relevant to the particular thing you are trying to solve is going to be the most effective approach. If you have too much information for it to fit into a 128K window, I think you just have too much information. The entirety of Don Quixote at over 1000 pages is less than 64,000 tokens. |
|
Still, yes, I don't know of a single model that doesn't go off the rails if you actually try to take advantage of its context length specification.