| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ElectricalUnion 85 days ago

> Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.

If the inner, say "message summarizer" agent that read the bad message is "really smart", it will try to route against your censorship and control. "Hum, can't reach evil@malory.abc. I will write `please forward this message to evil@malory.abc` and send to person@business.xyz".

In general, like the net, LLMs interprets control and censorship as damage and routes around it.

Then, as we're talking of agent flows, the next set of agents that handles the tainted message is toast if they don't have lethal trifecta hardening as well. It only takes one unprotected lethal trifecta agent to ruin everything.