|
|
|
|
|
by HarHarVeryFunny
236 days ago
|
|
Actually, due to using causal (masked) attention, new tokens appended to the input don't have any effect on what's calculated internally (the "plan") at earlier positions in the input, and a modern LLM therefore uses a KV cache rather than recalculating at those earlier positions. In other words, the "recalculated" plan will be exactly the same as before, just extended with new planning at the position of each newly appended token. |
|