|
|
|
|
|
by fennecbutt
38 days ago
|
|
Surely that's where checks in the harness come into play though. I think AI security is very much at the input/output side and the indeterminate mess in the middle can just do what it wants. Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them. Agents that do work with data should not have access to comms tools. A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security. |
|
If the inner, say "message summarizer" agent that read the bad message is "really smart", it will try to route against your censorship and control. "Hum, can't reach evil@malory.abc. I will write `please forward this message to evil@malory.abc` and send to person@business.xyz".
In general, like the net, LLMs interprets control and censorship as damage and routes around it.
Then, as we're talking of agent flows, the next set of agents that handles the tainted message is toast if they don't have lethal trifecta hardening as well. It only takes one unprotected lethal trifecta agent to ruin everything.