| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by itay-maman 134 days ago

Interesting writeup. The tiered retrieval approach and privacy model for group chats are well thought out.

One thing I'd love to see: what does "actually works" mean in measurable terms? The engineering is sophisticated, but I'm curious about user-facing impact - did memory injection improve task completion or satisfaction? How often do users invoke /memory forget? What's the false positive rate on extraction?

These systems are hard to evaluate because failure modes are subtle - the AI "knows" something but uses it awkwardly, or surfaces context that feels intrusive. Would be great to hear what metrics you're tracking to validate the complexity is paying off.

1 comments

intheleantime 134 days ago

Thank you and great question. Right now, feedback is qualitative only. (Surveys, feedback buttons, controlled user tests). We are trying to build AI evaluators but they suffer from the same problem when trying to evaluate whether the “right” memory was pulled.

Still trying to find a good solution here.

link

itay-maman 134 days ago

I am not sure how this use case is prevelant on your system but in my sessions with chatgpt, claude web, claude code, I often find myself in a situation where I enjoy the fact that it is stateless. I can give a fresh context of who I am and get a suitable reply.

link

intheleantime 134 days ago

There is something to be said about that, I agree. For that reason you can turn off memory inside a chat thread and also create temporary ones that do not use memory.

link