Hacker News new | ask | show | jobs
by bettaher_adam 85 days ago
The fail-closed approach is the right default. One thing I'd add to the attack classes you're considering: prompt injection via filesystem reads — an attacker can craft a file that, when read by the agent, injects instructions into the tool-call chain.

We solved a similar boundary problem by signing all outputs with HMAC-SHA256 so downstream consumers can verify the response wasn't modified after the tool-call boundary. Not a replacement for your approach but complementary — input validation + output signing covers both ends.

Is the MCPSEC benchmark public yet?