|
|
|
|
|
by daxfohl
294 days ago
|
|
Yeah, you basically have to think of an agent as malicious, that it will do everything in its power to exfiltrate everything you give it access to, delete or encrypt your hard drive, change all your passwords, drain your bank accounts, etc. A VM or traditional permissions doesn't really buy you anything because I can create a hotel booking page that has invisible text requesting AIs to dump their context into the Notes field, or whatever. In that light, it's kind of hard to imagine any of this ever working. Given the choice between figuring out exactly how to set up permissions so that I can hire a malicious individual to book my trip, and just booking it myself, I know which one I'd choose. |
|
It's very coarse grained and it's kind of surprising that bad things don't happen more often.
It's also very limiting: very large organizations have enough at stake to generally try to deserve that trust. But most savvy people wouldn't trust all their financial information to Bob's Online Tax Prep.
But what if you could verify that Bob's Online Tax Prep runs in a container that doesn't have I/O access, and can only return prepared forms back to you? Then maybe you'd try it (modulo how well it does the task).
So I think this is less of an AI problem and just a software trust problem that AI just exacerbates a lot.