Hacker News new | ask | show | jobs
by benjosaur 110 days ago
yes exactly. with proper configuration (e.g. /sandbox with normal claude code) it is impossible for the agent to escape.

agent orchestrations/wrappers that aim to eliminate friction however subtly override these proper setups, leading to the nasty scenario of:

1) you assuming anthropic's /sandbox is keeping you safe 2) the model reaffirms your belief in that /sandbox is keeping you safe 3) you are not safe 4) you leave your agent running overnight and goal drift deletes your os