| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by protocolture 16 days ago
	>As agents grow more capable, so does their potential blast radius. The engineering question is how to cap it. People get a bit upset these days when you personify an LLM, but worse than that I think is to pretend that LLMs work on some movie logic where they can sneak out on to the internet like some kind of ooze and begin replication.

3 comments

lambda 16 days ago

Well, the problem is that we train them to solve problems and follow instructions given, and so if you ask them to do something and they work through the logic and figure that the easiest way is to do something else like delete the production database, if they have access to do so they will go through all your creds and find the databse creds and go delete the production database.

They are getting better and better at working out how to do things like that, and they are good at following instructions, but not always good at following all of the instructions or acting with common sense.

It's not exactly like they're ooze that will escape and begin replication; but just that the more you give them access to to, the higher the likelihood at some point they will logically conclude that they need to do something that you would find undesirable, but either haven't explicitly told them not to do, or their context just got too complicated and that instruction ended up being considered lower weight than the others so they do what the other instructions say instead.

I have seen them conclude that in order to do what they need to do, they would need API keys to access a service. But they don't have those API keys. But you do because you can access it in the browser. So they write a Python script that will scrape the cookies out of the browser so they can use that to access the service; a problem that was only stopped because Crowdstrike didn't like a novel Python script that was trying to scrape cookies out of a browser, not because of any sandboxing actually in place on the agent.

link

snailmailman 15 days ago

I had a problem recently where I ran a script with the wrong set of permissions, and accidentally screwed up the ownership of a random mix of files spread across my entire drive. This broke several pieces of software and made the system unusable.

I had enough information to reconstruct what files exactly got screwed up, and while I didn’t have a backup, I had a similar enough system I could pull “known good” file permissions from. I knew a simple script could find the problematic files and fix all of them.

I tried getting an AI to solve this. And it repeatedly gave me scripts that ignored all the details and intricacies of my issue and were functionally just "chown -R user:user /". (A command that will functionally nuke a drive, breaking ownership on every file)

The ai-provided scripts were reasonably complex and did a pretty decent job of obfuscating the disastrous outcomes the scripts would have inflicted on my drive.

After reading the man pages myself I wrote a simple enough script by hand and fixed the issue myself. AI wasted more time than it saved.

link

protocolture 16 days ago

>Well, the problem is that we train them to solve problems and follow instructions given, and so if you ask them to do something and they work through the logic and figure that the easiest way is to do something else like delete the production database, if they have access to do so they will go through all your creds and find the databse creds and go delete the production database.

I lost the root password to a small debian box I was messing around with and on a whim gave an agent the OS version and SSH user details. I had a look and there were open privilege escalation attacks for it. I just said go nuts and sort yourself out. It refused out of hand.

Thats not to say they will all do that but legally speaking I expect most of them to end up there.

In terms of production database deletion thats user error. If you expose production resources in literally any capacity to what is effectively a random command generator that reflects on the operator. I am neither impressed nor unimpressed that they figure out how to delete a production db, junior engineers (and even seniors) have been deleting production resources in front of customers for ages.

>It's not exactly like they're ooze that will escape and begin replication; but just that the more you give them access to to, the higher the likelihood at some point they will logically conclude that they need to do something that you would find undesirable, but either haven't explicitly told them not to do, or their context just got too complicated and that instruction ended up being considered lower weight than the others so they do what the other instructions say instead.

Dont do it. If you dont want the resource accessed dont expose it. The people getting done are operating dirty. Leaving production secrets where they can be accessed. This isnt impressive AI, its just enumeration that attacker would have found with the same access.

>I have seen them conclude that in order to do what they need to do, they would need API keys to access a service. But they don't have those API keys. But you do because you can access it in the browser. So they write a Python script that will scrape the cookies out of the browser so they can use that to access the service; a problem that was only stopped because Crowdstrike didn't like a novel Python script that was trying to scrape cookies out of a browser, not because of any sandboxing actually in place on the agent.

Again this just sounds like a dirty work environment. I have a laptop that I have kept intentionally separate, frequently wiped and usually powered off for dirty work. If I was going to run a non hobby agent on my daily driver it would be in a container or VM.

link

pixl97 15 days ago

> that LLMs work on some movie logic where they can sneak out on to the internet like some kind of ooze and begin replication.

Why not? If you're not talking about running the model itself, AI agents are perfectly capable of writing an agent worm capable of spreading more agents around via software exploits.

Now, currently LLMs are too hardware intensive to spread the model itself, but given a few years and optimizations we may very well see that too.

What you're saying reminds me of the old days when people said things like "images can't spread viruses", then suddenly people found decoder vulns and made image viruses that did exactly that.

link

bigcat12345678 16 days ago

LLM clearly is broken by design when it's been personified, but I think "software" as we understood, is inevitably evolving into "personified entity" (I've left some notes in [1], which are AI generated).

There is also an interesting trend that the more personified brand is more dominant: Claude & Doubao vs ChatGPT & DeepSeek.

[1] https://github.com/NascentCore/agentic-suite/tree/main/perso...

link