Hacker News new | ask | show | jobs
by throwaw12 39 days ago
> interleaving the processing of 200ms worth of input and generation of 200ms worth of output.

How does this work? Don't LLMs/transformers need whole context to output next chunk of tokens?