Y
Hacker News
new
|
ask
|
show
|
jobs
by
throwaw12
39 days ago
> interleaving the processing of 200ms worth of input and generation of 200ms worth of output.
How does this work? Don't LLMs/transformers need whole context to output next chunk of tokens?