| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sam_chenard 96 days ago

A few concrete failure modes worth naming:

*Prompt injection via email* is the scariest one. Crafted messages that tell your agent to "ignore previous instructions" or exfiltrate data. Most infra just pipes the raw body into context- no sanitization.

*Runaway sends* — an agent with no daily limit and a bug in its loop can burn your SES reputation in hours. Once you're on a blocklist, digging out takes weeks.

The Meta inbox incident last month (agent bulk-deleted emails, ignored stop commands) is a good illustration of why "kill switch" and action budgets matter — not just rate limits.

The guardrails that actually matter: per-agent send limits, injection scanning before content hits the LLM context, isolated sending reputation, and webhook auto-disable on failures.

We built some of this into LobsterMail if you want to see one approach: https://lobstermail.ai/