|
|
|
|
|
by tptacek
53 days ago
|
|
There are other models. Eschew the sandbox. Give the agent a computer, with all the trimmings, but keep that computer segregated from sensitive resources. Tokens are a solved problem: tokenize them[1] or do something equivalent with a proxy. The same thing goes for secrets. A lot of this post presents false dichotomies. It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why? There are reasons to do that and reasons not to do that. You can have a durable computer with a network identity and full connectivity, and you can have that computer spin down and stop billing when not in use. There are a zillion different shapes for addressing these problems, and I'm twitchy because I think people are super path-dependent right now, and it's causing them to miss a lot of valuable options. [1]: https://fly.io/blog/tokenized-tokens/ (I work at Fly.io but the thing this post talks about is open source). |
|
Giving the whole machine also doesn't answer the question for how the agent can hook into actions that eventually require more perms, and even if you "airgap" those via things like output queues that humans need to approve, that still feels "harnessey" to me.
I feel a bit guilty of debating semantics here, especially as I can't/don't intend to convey any confidence in a "right answer", but my reason for being pedantic is that I do think there are interesting tradeoffs between "P(jailbreak or unexpected capability use|time)" and "increasing power/available capability set", as well as interesting primitives emerging in terms of the components you'd need regardless of where you drew that line (ala paragraph 2.)
[0] - https://www-cdn.anthropic.com/3edfc1a7f947aa81841cf88305cb51... (specifically section 5.5.2.4)