Hacker News new | ask | show | jobs
by HarHarVeryFunny 753 days ago
There's the KV cache kept from one word to the next, but isn't that just an optimization ?
1 comments

Yes, the 'KV cache' (imo an invented novelty, everyone was doing this before they came up with a term to make it sound cool) is an optimization so that you don't have to recompute what the model was thinking when it was generating all the prior words every time you decode a new word.

But that's exactly what I'm saying - the model has access to what it was thinking when it generated the previous words, it does not start from scratch. If you don't have the KV cache, you still have to regenerate what it was thinking from the previous words so on the next word generation you can look back at what you were thinking from the previous words. Does that make sense? I'm not great at talking about this stuff in words

I don't think you can really say it "regenerates" what it was thinking from the last prompt, since the new prompt is different from the previous one (it has the new word appended to the end, which may change the potential meanings of the sentence).

There will be some overlap in what the model is now "thinking" (and has calculated from scratch) since the new prompt is one possible continuation of the previous one, but other things it was previously "thinking" will no longer be there.

e.g. Say the prompt was "the man", and output probabilities include "in" and "ran", reflecting the model thinking of potential continuations such as "the man in the corner" and "the man ran for mayor". Suppose the word sampled was "ran", so now the new prompt is "the man ran". Possible continuations can no longer include refining who the subject is, since the new word "ran" implies the continuation must now be an action.

There is some work that has been saved, per the KV cache, in processing the new prompt, but that is only things (self attention among the common part of the two prompts) that would not change if recalculated. What the model is thinking has changed, and will continue to change depending on the next sampled continuation ("the man ran for mayor", "the man ran for cover", "the man ran his bath", etc).