| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pizza234 42 days ago

There’s nuance to the infamous PocketOS incident. The key point is not what is emphasized in the linked article:

> "Why did you delete it when you were told never to perform this action?" Then he tried to parse the answer to either learn from his mistake or warn us about the dangers of AI agents.

Rather, that the AI was able to carry out the deletion by finding and exploiting an unintended weakness in the sandboxed staging environment, ultimately obtaining permissions that the sysadmins believed were inaccessible (my impression is that the author of the linked article didn't fully read the original post)¹

The dynamics are typical of an improperly configured sandbox environment. What is alarming, however, is the degree of autonomy and depth of exploration the AI displayed.

¹="To execute the deletion, the agent went looking for an API token. It found one in a file completely unrelated to the task it was working on."

2 comments

larusso 42 days ago

I also swing a bit back and forth with the assumption the OP makes in the blogpost. My current fear using agents is not really supply chain attacks (yes of course as well) but the fact that I witnessed multiple times that agents are so eager to finish a task that they bend files and other things around. Like “oh I have no access to ~/.npmrc let’s call the command with an environment variable and bend the path around etc. They can get very very creative. I luckily have no ssh keys just laying around. But I had to change the setting of 1Password to always prompt for key use not just once per shell session. Just in case I spawn an agent from said session. I wished we already had more and better cross platform sandbox solutions. I mean solutions where the agent still interacts with the same OS etc not inside a docker container. I think for most web / server development that makes no difference but for some projects it does.

teling 42 days ago

> What is alarming, however, is the degree of autonomy and depth of exploration the AI displayed.

Claude Code made a change on March 26th to skip asking for most permissions. See this quote "Claude Code users approve 93% of permission prompts. We built classifiers to automate these decisions":

https://www.anthropic.com/engineering/claude-code-auto-mode