Hacker News new | ask | show | jobs
by guywithabowtie 990 days ago
We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
1 comments

Sorry, what does "up to 4 million tokens and more" mean? It seems like a contradiction.
Not really a contradiction so much as redundant/poorly worded. Should have said, "at least 4 million tokens".
Here's a reference describing what a context window for LLMs is:

https://www.hopsworks.ai/dictionary/context-window-for-llms