Hacker News new | ask | show | jobs
HaluMem: Evaluating Hallucinations in Memory Systems of Agents (arxiv.org)
2 points by timini 221 days ago
1 comments

HaluMem introduces the first benchmark for evaluating hallucinations in agent memory systems at the operation level. Through three evaluation tasks (memory extraction, updating, and question answering), it reveals that existing memory systems generate and accumulate hallucinations during early stages, which then propagate errors downstream. The benchmark uses two datasets spanning different context scales to systematically reveal these failure modes.