|
|
|
|
|
by gdad
111 days ago
|
|
Building in the same space (Maximem Vity, cross-LLM memory as a Chrome extension and OpenClaw plugin). The SMTP analogy really resonates with me. I wrote a comparison a few weeks ago of how ChatGPT, Claude, and OpenClaw actually implement memory under the hood [1] and the architectures are so different that interop feels almost accidental when it works.
One thing I keep running into: the hard part isn't storage or retrieval. It's qualification. Deciding what's worth remembering from a conversation vs. what's throwaway context. ChatGPT takes the "summarize the last 15 chats" approach, Claude does on-demand search, and both have real failure modes. We went with a semantic graph that tries to capture relationships between memories (your preference for serverless connects to your AWS project connects to your cost constraints) rather than flat key-value pairs. Still iterating on it honestly.
Re: Claude's import-memory launch yesterday, I think the timing validates the category but the approach is fundamentally migration, not sync. You import once, and from that moment your contexts diverge again. Anyone using 3+ tools daily (which is basically everyone I talk to) is back to fragmented memory within a week.
Curious about your retrieval approach. Are you doing hybrid search or pure semantic?
[1] https://www.maximem.ai/blog/ai-apps-memory |
|
I like your approach - I went down a bit more of a swiss-army-knife one with a hybrid.
"Session memory" where turns are stored in full so that the full context can be retrieved if an important fact is missed in compaction. After a certain time sessions are compacted. I'm currently evaluating a model where only a certain number of turns per session are kept plain-text and compaction happens as a sliding window "knowledge memory" All info is periodically fed into a knowledge graph extraction and a knowledge graph is built and indexed "memory chunking" chunks of memory are stored individually into a vector space where they can be retrieved through similarity search as well as standard semantic searches.
My theory is that giving the agent flexibiity to query the tool best suited for it's current needs is the way to go, as agents/llms become better, they'll only get better at summarisation and tool choice.
Couple reasons I like storing more data over less is that storage is cheap and if compaction misses details, you can prompt your agent to go back to that session you had about X last week and extract that detail.
Similar to you however, it's still also very early stages and the ground is shifting rapidly. Claude Codes HTTP hooks are a great step towards a model that I believe can work assuming it gets adopted as a standard. A big problem of having memory be portable is that you gotta be able to plug it into any client you want, and for the time being, MCP + system prompt is the only option - and even that one's more than flaky.
Would be awesome to chat to someone from OpenAI or Anthropic on their take and pow-wow on options