Hacker News new | ask | show | jobs
by markive 1089 days ago
There is nothing malicious about Alter Table or Drop Table commands. These all have valid use-cases and is not something an LLM needs to guard against.

If a bad-actor can issue these commands against your DB, you are already toast!

2 comments

> If a bad-actor can issue these commands against your DB, you are already toast!

Don't overlook the damage potential of a fresh-faced college-hire on-call at 2am with dba access to prod

When I was 18, at my first dev job, I was put in charge of trying to migrate and modernize an old PHP app for a client.

I had been there maybe 6 months. For some reason I can't recall I was meant to delete the staging RDS database.

Well, the databases weren't given human names, they were both long autogenerated strings.

I deleted the staging database and then prod stopped working

Whoops

Again, you are already toast. No one should have access to prod except the promote automation.
I think for many (though not necessarily most) busineses, especially micro-ISVs running a SaaS, the business cost of doing what they "should" be doing far exceeds the actual cost of letting bad-things-happen and paying for the cleanup afterwards.

As the designated "code-and-Red Bull-guy" at the micro-ISV I work at, I'll admit that I've unintentionally nuked the production DB at 2am - but fortunately our cloud database provider could do a point-in-time restore and everything was fully operational again by 2:30am. That's because the cost of setting up infrastructure and procedures to eliminate the need to ever manually run DML/DDL against our prod databases would be... probably a multiple of my salary - and be required indefinitely into the future (as that infrastructure would have to be maintained as the database's design changes over time too) - whereas the cost of having PITR on our prod databases in Azure is... a rounding error.

So yes, our prod is going to go down in future - we can't afford not to, honestly (it's a USA-only B2B SaaS, we get literally zero usage before 6am EST and after 7pm PST).

The college-hire is not the problem; it is the person who gave the kid prod access, or perhaps at a higher level, the person who architected the DB permissions structure. If one person alone can cause a production SEV, many things had to have gone wrong by many other people beforehand.
I remember having a test suite that would connect to a local db running in a docker container and would nuke the tables and then set up the records in a known state before running through. Worked great until someone changed the connection string to point at an actual database.
One possible trick you might consider is to (manually!) add objects to the DB's schema that explicitly indicate its environment, e.g. `CREATE TABLE dbo.ThisIsProductionYouSillyPerson ( DummyCol int NOT NULL );` for prod and `CREATE TABLE dbo.ThisIsTestFeelFreeToMessAround ( DummyCol int NOT NULL );` - these tables would be excluded from the automated DB deployment code - and write test scripts that all start by checking that the `dbo.ThisIsTestFeelFreeToMessAround` table exists and that the `dbo.ThisIsProductionYouSillyPerson` table does not exist in the DB before continuing.

DB automation is great for preventing mistakes during common routine operations, but because DB automation can also go haywire and delete drop all-by-itself unless you set-up out-of-band (if that's the right term?) safeguards. Having airgapped dev-test-staging-and-prod won't help you if if you forgot the `WHERE` in an `UPDATE` in a little-used script that the prod automation uses, that testing never discovered (which happens all the time, it's scary).

I do appreciate how MySQL does come with an `UPDATE-without-key` guard, but I'm surprised none of the other RDBMS have safety-guards like that - just a simple `RequireManualConfirmationForMultiRowDml` flag on a table would help.

I've only ever queried (very large) databases but my eyes always go a bit wide when i see statements that touch tables. They scare me. They scare me when i run them on an sqlite table i made 5 minutes ago for an experiment.

I see the problem as much, much more insidious and not the expected threat vector. The past few months many of us have seen these models become increasingly worse at keeping track of details and hallucinating.

They mix in information within their context window, and the cope that OpenAI has given us for their worse ability to generate good quality output is .... more context! Great.

So what happens when that context window (which you have no real idea how they're actually implementing it) has the concept of "DROP" in it? Or what happens when It's a long day, you looked over it and it's all correct, but in some buried inner query something changed? Probably it just costs some time to debug, bu..

Obviously there should be a few safeguards before that query gets executed but i never want to see an increasingly cheapening and more wide-spread black box like GPT be able to "speak" a word which in principle can cost 6-7 figure damages or worse.

We don't let actively hallucinating people brandish firearms for a reason