Besides that.. Agents reporting task completion while the system state says otherwise is predictable once you think about it. Next-token prediction optimizes for plausible outputs, not ground truth.
This work was performed by people across 13 institutions, invited and coordinated through the team at Northeastern. A research "swarm" seems like a great model for this kind of work. I'm curious about how it was funded, I didn't see any acknowledgements that way. The intro references the NIST Agent Standards Initiative. Also, the acknowledgement to "Andy Ardity" should for "Andy Arditi"?
TL;DR: The authors found current-generation AI agents are too unreliable, too untrustworthy, and too unsafe for real-world use.
Quoting from the abstract:
"We report an exploratory red-teaming study of autonomous language-model–powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions."
"Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover."
https://news.ycombinator.com/item?id=47196883
https://news.ycombinator.com/item?id=47134473
https://news.ycombinator.com/item?id=47147764
https://news.ycombinator.com/item?id=47141321
Besides that.. Agents reporting task completion while the system state says otherwise is predictable once you think about it. Next-token prediction optimizes for plausible outputs, not ground truth.