|
|
|
|
|
by sam_chenard
96 days ago
|
|
A few concrete failure modes worth naming: *Prompt injection via email* is the scariest one. Crafted messages that tell your agent to "ignore previous instructions" or exfiltrate data. Most infra just pipes the raw body into context- no sanitization. *Runaway sends* — an agent with no daily limit and a bug in its loop can burn your SES reputation in hours. Once you're on a blocklist, digging out takes weeks. The Meta inbox incident last month (agent bulk-deleted emails, ignored stop commands) is a good illustration of why "kill switch" and action budgets matter — not just rate limits. The guardrails that actually matter: per-agent send limits, injection scanning before content hits the LLM context, isolated sending reputation, and webhook auto-disable on failures. We built some of this into LobsterMail if you want to see one approach: https://lobstermail.ai/ |
|