| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by msoloviev 555 days ago

I wonder how the "ring context" works under the hood. I have previously had (and recently messed around with again) a somewhat similar project designed for a more toy/exploratory setting (https://github.com/blackhole89/autopen - demo video at https://www.youtube.com/watch?v=1O1T2q2t7i4), and one of the main problems to address definitively is the question of how to manage your KV cache cleverly so you don't have to constantly perform too much expensive recomputation whenever the buffer undergoes local changes.

The solution I came up with involved maintaining a tree of tokens branching whenever an alternative next token was explored, with full LLM state snapshots at fixed depth intervals so that the buffer would only have to be "replayed" for a few tokens when something changed. I wonder if there are some mathematical properties of how the important parts of the state (really, the KV cache, which can be thought of as a partial precomputation of the operation that one LLM iteration performs on the context) work that could have made this more efficient, like to avoid saving full snapshots or perhaps to be able to prune the "oldest" tokens out of a state efficiently.

(edit: Georgi's comment that beat me by 3 minutes appears to be pointing at information that would go some way to answer my questions!)