| > Resursive summarization (Wu et al., 2021b) is a simple way to address overflowing context windows, however, recursive summarization is inherently lossy and eventually leads to large holes in the memory of the system. Yes, it does. > In our experiments ... conversational context is read-only with a special eviction policy (if the queue reaches a certain size, a portion of the front is truncated or compressed via recursive summarization), and working context is writeable by the LLM processor via function calls. You're doing the same thing, and you have the same problems. You're just doing it slightly differently; in this case instead of recursively summarizing everything, you're selectively searching the history and generating it for each request. Cool idea. ...but, I'm skeptical; this fundamentally relies on the assumption that the existing context consists of low entropy summarizable context, and that any query relies only on a subset of the history. This might be true for, eg. chat, or 'answer question about some document in this massive set of documents'. ...but, both of these assumptions are false in some contexts; for example, generating code, where the context is densely packed with information which is not discardable (eg. specific api definitions), and a wide context is required (ie. many api definitions). It is interesting how this is structured and done, and hey, the demo is cool. I'm annoyed to see these papers about summary things fail to acknowledge the fundamental limitations of the approach. |
These are both separate from MemGPT's external context, which is pulled into the conversation queue via function calls. In all our examples, reads from external context are uncompressed (no summarization) and paginated. MemGPT receives a system alert when the queue summarization is triggered, so if MemGPT needs to keep specific details from the conversation queue it can write it to working context before it's erased or summarized.
In the conversational agent examples, working context (no summarization, and separate from the conversation queue) is used to store key facts about the user and agent to maintain consistent conversation. Because the working context is always seen by the LLM, there's no need to retrieve it to see it. In doc QA, working context can be used to keep track of the current task/question and progress towards that task (for complex queries, this helps MemGPT keep track of details like the previous search, previous page request, etc.).