| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by guywithabowtie 990 days ago
	We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.

1 comments

Sorry, what does "up to 4 million tokens and more" mean? It seems like a contradiction.

Not really a contradiction so much as redundant/poorly worded. Should have said, "at least 4 million tokens".

Here's a reference describing what a context window for LLMs is: