| HN Mirror

Explicit memory management (MemGPT-style) vs implicit/external memory management is an interesting tradeoff. Like you said, adding all the instructions on how to manage memory consumes ~1k tokens (using the default prompts on our MemGPT GitHub release), which is a lot when your context window is 8k. Additionally, it requires the base LLM to be very good at instruction following; gpt-4 can do it well, but it's much more difficult to get explicit memory management to work with gpt-3.5-turbo or llama2 70b finetunes (so to build a robust system, you may have to end up having to "split" the thinking out of necessity).

One of the main benefits of explicit memory management is simplicity - e.g., you don't have to manage logic between a "memory creation" thread and a "dialogue thread". The explicit approach also integrates well with the iterative paging/retrieval for document analysis we demo in the paper/on GitHub.