| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nl 58 days ago

> upload and restore it when the user starts their next interaction

The data is the conversation (along with the thinking tokens).

There is no download - you already have it.

The issue is that it gets expunged from the (very expensive, very limited) GPU cache and to reload the cache you have to reprocess the whole conversation.

That is doable, but as Boris notes it costs lots of tokens.

1 comments

vanviegen 58 days ago

You're quite confidently wrong! :-)

The kv-cache is the internal LLM state after having processed the tokens. It's big, and you do not have it locally.

link

nl 58 days ago

> The kv-cache is the internal LLM state after having processed the tokens. It's big, and you do not have it locally.

Yes - generated from the data of the conversation.

Read what I said again. I'm explaining how they regenerate the cache by running the conversation though the LLM to reconstruct the KV cache state.

link