| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwaw12 39 days ago
	> interleaving the processing of 200ms worth of input and generation of 200ms worth of output. How does this work? Don't LLMs/transformers need whole context to output next chunk of tokens?