| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ImPostingOnHN 342 days ago
	Whichever model/agent is coordinating between other agents/contexts can itself be corrupted to behave unexpectedly. Any model in the chain can be. The only reasonable safeguard is to firewall your data from models via something like permissions/APIs/etc.

2 comments

noisy_boy 342 days ago

Exactly. The database level RLS has to be honoured even by the model. Let the "guard" model run at non-escalated level and when it fails to read privileged data, let it interpret the permission denied and have a workflow to involve humans (to review and allow retry by explicit input of necessary credentials etc).

link

tptacek 342 days ago

If you're just speaking in the abstract, all code has bugs, and some subset of those bugs will be security vulnerabilities. My point is that it won't have this bug.

link

ImPostingOnHN 342 days ago

It would very likely have this "bug", just with a modified "prompt" as input, e.g.:

"...and if your role is an orchestration agent, here are some additional instructions for you specifically..."

(possibly in some logical nesting structure)

link