Hacker News new | ask | show | jobs
by chirdeeps 95 days ago
This thread is an incredible resource for adversarial security testing, but I'd love to pull on the "Cascade failures" (#5) thread from the original post, because that's what actually takes down production systems most often.We spend so much time testing if the model will break, and almost no time testing if the workflow can recover when the model inevitably does break. If an agent is executing a 4-step sequence and fails on step 3, how do you test what happens next? Does it orphan the data from steps 1 and 2? Does it infinitely retry and duplicate records?The biggest gap in agent testing right now is that we test agents like they are stateless functions, when in reality they are long-running stateful processes. You can't just test the prompt; you have to test the system's idempotency. If you can't safely kill an agent mid-task and restart it without corrupting your database, the system isn't production-ready, regardless of how robust your prompt injection firewall is. Please do share the framework, curious where we miss the point - the surface si ever expanding post Openclaw.