Hacker News new | ask | show | jobs
by kuboble 47 days ago
That's an interesting concept. So it's like if you're an agent chatting with a user, you have an army of assistants who overhear the conversation and record important facts, or search relevant facts on some database and decide on the fly when to interrupt you with "this memory X looks relevant". Sounds easy enough if tokens were free, but an interesting problem to do it efficiently.
3 comments

Burst-parallel non-frontier models can resemble "tokens were free". And there one might potentially augment not just conversations, but CoT - retroactively by submitting messages with altered reasoning strings, or inline with the inference loop watching CoT and attempting non-distracting injection.
Simple vector similarity plus a cheap model to filter results works pretty well. Though ofc t does add tokens to your primary chat, which is the basic tradeoff of memory systems in general (in addition to latency)
That's exactly what claude-code does these days. If you AFK for ~5 minutes it also produces a summary of where you are, which is useful if you're juggling multiple windows.