| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kuboble 95 days ago
	That's an interesting concept. So it's like if you're an agent chatting with a user, you have an army of assistants who overhear the conversation and record important facts, or search relevant facts on some database and decide on the fly when to interrupt you with "this memory X looks relevant". Sounds easy enough if tokens were free, but an interesting problem to do it efficiently.

3 comments

mncharity 94 days ago

Burst-parallel non-frontier models can resemble "tokens were free". And there one might potentially augment not just conversations, but CoT - retroactively by submitting messages with altered reasoning strings, or inline with the inference loop watching CoT and attempting non-distracting injection.

link

jjfoooo4 94 days ago

Simple vector similarity plus a cheap model to filter results works pretty well. Though ofc t does add tokens to your primary chat, which is the basic tradeoff of memory systems in general (in addition to latency)

link

eterm 95 days ago

That's exactly what claude-code does these days. If you AFK for ~5 minutes it also produces a summary of where you are, which is useful if you're juggling multiple windows.

link