Ask HN: How do you solve AI's confused deputy problem?

Y	Hacker News new \| ask \| show \| jobs

2 points by david_shi 14 days ago

An agent's value is proportional to the permissions it's been granted.

There's been a lot of hype around solutions like default denial proxies, key vaults, and more, but nothing seems to address the core tension: an agent can be tricked into doing an attacker's bidding.

The best thing I could think of was to just run an observer loop and monitor everything the agent does with another LLM, but I'm curious if anyone has an elegant solution.

1 comments

difc 10 days ago

I'm building Nucleus for exactly this problem - using information flow control and formal methods, we can prevent confused deputies by proofs instead of heuristics.

Very much WIP, would appreciate any feedback. https://github.com/coproduct-opensource/nucleus

link