Y
Hacker News
new
|
ask
|
show
|
jobs
by
animan
355 days ago
That snippet is trying to say that you can calculate KV for all the input tokens at once, and you don't need to loop over them since you have them all available.
Instead for decode, you need to sequentially generate each token.