Hacker News new | ask | show | jobs
by boc 58 days ago
LLMs can research what a tool does before calling it though - they'll sniff that one out pretty quick.

I think the better route is to be honest and say that database integrity is a primary foundation of the company, there's no task worth pursuing that would require touching the database, specifically ask it to think hard before doing anything that gets close to the production data, etc.

I run a much lower-stakes version where an LLM has a key that can delete a valuable product database if it were so inclined. I've built a strong framework around how and when destructive edits can be made (they cannot), but specifically I say that any of these destructive commands (DROP, -rm, etc) need to be handed to the user to implement. Between that framework and claude code via CLI, it's very cautious about running anything that writes to the database, and the new claude plan permissions system is pretty aggressive about reviewing any proposed action, even if I've given it blanket permission otherwise.

I've tested it a few times by telling it to go ahead, "I give you permission", but it still gets stopped by the global claude safety/permissions layer in opus 4.7. IMO it's pretty robust.

Food for thought.

4 comments

> specifically ask it to think hard before doing anything that gets close to the production data

This is recklessly negligent and I would personally not tolerate a coworker or report doing it. What's next, sending long-lived access tokens out over email and asking pretty please for nobody to cc/forward?

As described, there are other failsafes as well. The ultimate being that I keep all code version-controlled, and all databases snapshotted offsite daily/hourly and can rebuild them from a complete delete in fewer than X min.

My broader point is that LLMs are going to need access to these keys whether we like it or not, and until we get extremely scoped API permissions (which would make a ton of sense, but most services aren't there), you have to live a bit on the edge to move quickly.

> The ultimate being that I keep all code version-controlled, and all databases snapshotted offsite daily/hourly and can rebuild them from a complete delete in fewer than X min.

Mitigation is good, but what's preventing your sudo-privileged LLM from disabling/corrupting/deleting on-site backups either directly or by proxy via access to the DB and code that writes to it?

It's a good question. I think it's similar to the question about an employee having sensitive access, and whether they'll get blackout drunk one night and delete everything. Or they get spearfished and get owned (prob more likely).

In the future, I could see this solved by the same "nuclear launch key" style delegation of keys. Aka in order to run certain API or database commands, the service requires both the standard dev key (presumably used by the LLM) and a separate "human admin key" that gets requested whenever a specific operation is requested. It could be tied to a biometric request or something as well to avoid the LLM hacking its way around it. Honestly this is pretty out of my technical depth but just thinking out-loud.

The difference with a rogue employee is they can be held accountable so they are verily heavily incentivized to avoid doing that (and hopefully also by the good pay and work environment you are providing them).

And, a lot of DevOps/SecOps at scale is concerned with mitigating potential rogue or dangerously incompetent employees. You don't let your juniors push senior-unreviewed code, much less let them anywhere near the keys to kingdom if you can help it.

Very fair points! I think I'll re-assess how I'm handling my setup. Unfortunately I don't have a dedicated devOps team, but still want to do my best to prevent those types of outcomes.
>>LLMs can research what a tool does before calling it though

Thats stretching the definition of 'research', it basically checks if the texts are close enough.

Delete can occur in various contexts, including safe contexts. It simply checks if a close enough match is available and executes. It doesn't know if what it is doing is safe.

Unfortunately a wide variety of such unsafe behaviours can show up. I'd even say for someone that does things without understanding them. Any write operation of any kind can be deemed unsafe.

> specifically ask it to think hard before doing anything that gets close to the production data, etc.

Standard rule is you never let your developers at the production instance. So I can't see why an LLM would get a break.

"I've put enough safety around the bomb that the bomb is worth using. The other people that exploded just didn't have enough safety but I do !"
More like, I expect this bomb can explode, so I've built contingency plans around it because the cost of not using the tooling is much higher than having downtime for my specific use-case.