|
|
|
|
|
by elorant
295 days ago
|
|
From my experience context window by itself tells half the story. You load a big document that’s 200k tokens and ask it a question, it will answer just fine. You start a conversation that soon enough balloons past 100k then it starts losing coherence pretty quickly. So I guess batch size plays a more significant role. |
|
Because my understandings is that, however you get to 100K, the 100,001st token is generated the same way as far as the model is concerned.