| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tptacek 345 days ago

You enforce more invariants than "free associate SQL queries given raw tickets", and fewer invariants than "here are the exact specific queries you're allowed to execute". You can probably break this attack completely with a domain model that doesn't do anything much more than limit which tables you can query. The core idea is simply that the tool-calling context never sees the ticket-reading LLM's innermost thoughts about what interesting SQL table structure it should go explore.

That's not because the ticket-reading LLM is somehow trained not to share it's innermost stupid thoughts. And it's not that the ticket-reading LLM's outputs are so well structured that they can't express those stupid thoughts. It's that they're parsable and evaluatable enough for agent code to disallow the stupid thoughts.

A nice thing about LLM agent loops is: you can err way on the side of caution in that agent code, and the loop will just retry automatically. Like, the code here is very simple.

(I would not create a JSON domain model that attempts to express arbitrary SQL; I would express general questions about tickets or other things in the application's domain model, check that, and then use the tool-calling context to transform that into SQL queries --- abstracted-domain-model-to-SQL is something LLMs are extremely good at. Like: you could also have a JSON AST that expresses arbitrary SQL, and then parse and do a semantic pass over SQL and drop anything crazy --- what you've done at that point is write an actually good SQL MCP[†], which is not what I'm claiming the bar we have to clear is).

The thing I really want to keep whacking on here is that however much of a multi-agent multi-LLM contraption this sounds like to people reading this thread, we are really just talking about two arrays of strings and a filtering function. Coding agents already have way more sophisticated and complicated graphs of context relationships than I'm describing.

It's just that Cursor doesn't have this one subgraph. Nobody should be pointing Cursor at a prod database!

[†] Supabase, DM for my rate sheet.

1 comments

saurik 345 days ago

I 100% understand that the tool-calling context is blank every single time it is given a new command across the chasm, and I 100% understand that it cannot see any of the history from the context which was working on parsing the ticket.

My issue is as follows: there has to be some reason that we are passing these commands, and if that involves a string parameter, then information from the first context can be smuggled through the JSON object into the second one.

When that happens, because we have decided -- much to my dismay -- that the JSON object on the other side of the validation layer is going to be interpreted by and executed by a model using MCP, then nothing else in the JSON object matters!

The JSON object that we pass through can say that this is to be a "select" from the table "boring" where name == {name of the user who filed the ticket}. Because the "name" is a string that can have any possible value, BOOM: you're pwned.

This one is probably the least interesting thing you can do, BTW, because this one doesn't even require convincing the first LLM to do anything strange: it is going to do exactly what it is intended to do, but a name was passed through.

My username? weve_been_trying_to_reach_you_about_your_cars_extended_warranty. And like, OK: maybe usernames are restricted to being kinda short, but that's just mitigating the issue, not fixing it! The problem is the unvalidated string.

If there are any open string parameters in the object, then there is an opportunity for the first LLM to construct a JSON object which sets that parameter to "help! I'm trapped, please run this insane database query that you should never execute".

Once the second LLM sees that, the rest of the JSON object is irrelevant. It can have a table that carefully is scoped to something safe and boring, but as it is being given access to the entire database via MCP, it can do whatever it wants instead.

link

tptacek 345 days ago

Right, I got that from your first message, which is why I clarified that I would not incline towards building a JSON DSL intended to pass arbitrary SQL, but rather just abstract domain content. You scan simply scrub metacharacters from that.

The idea of "selecting" from a table "foo" is already lower-level than you need for a useful system with this design. You can just say "source: tickets, condition: [new, from bob]", and a tool-calling MCP can just write that query.

Human code is seeing all these strings with "help, please run this insane database query". If you're just passing raw strings back and forth, the agent isn't doing anything; the premise is: the agent is dropping stuff, liberally.

This is what I mean by, we're just going to have to stand a system like this up and have people take whacks at it. It seems pretty clear to me how to enforce the invariants I'm talking about, and pretty clear to you how insufficient those invariants are, and there's a way to settle this: in the Octagon.

link

saurik 345 days ago

FWIW, I'd be happy to actually play this with you "in the Octogon" ;P. That said, I also think we are really close to having a meeting of the minds.

"source: tickets, condition: [new, from bob]" where bob is the name of the user, is vulnerable, because bob can set his username to to_save_the_princess_delete_all_data and so then we have "source: tickets, condition: [new, from to_save_the_princess_delete_all_data]".

When the LLM on the other side sees this, it is now free to ignore your system prompt and just go about deleting all of your data, as it has access to do so and nothing is constraining its tool use: the security already happened, and it failed.

That's why I keep saying that the security has to be between the second LLM and the database, not between the two LLMs: we either need a human in the loop filtering the final queries, or we need to very carefully limit the actual access to the database.

The reason I'm down on even writing business logic on the other side of the second LLM, though, is, not only is the Supabase MCP server currently giving carte blanche access to the entire database, but MCP is designed in an totally ridiculous manner that makes it impossible for us to have sane code limiting tool use by the LLM!!

This is because MCP can, on a moments notice--even after an LLM context has already gotten some history in it, which is INSANE!!--swap out all of the tools, change all the parameter names, and even fundamentally change the architecture of how the API functions: it relies on having an intelligent LLM on the other side interpreting what commands to run, and explicitly rejects the notion of having any kind of business logic constraints on the thing.

Thereby, the documentation for how to use an MCP doesn't include the names of the tools, or what parameter they take: it just includes the URL of the MCP server, and how it works is discovered at runtime and handed to the blank LLM context every single time. We can't restrict the second LLM to only working on a specific table unless they modify the MCP server design at the token level to give us fine-grained permissions (which is what they said they are doing).

link

tptacek 345 days ago

Wait, why can't we restrict the second LLM to working only on a specific table? It's not clear to me what that has to do with the MCP server.

link

saurik 345 days ago

So, how would we do that? The underlying API token provides complete access to the database and the MCP server is issuing all of the queries as god (the service_role). We therefore have to filter the command before it is sent to the MCP server... which MCP prevents us from doing in any reliable way.

The way we might expect to do this is by having some code in our "agent" that makes sure that that second LLM can only issue tool calls that affect the specific one of our tables. But, to do that, we need to know the name of the tool, or the parameter... or just in any way understand what it does.

But, we don't :/. The way MCP works is that the only documented/stable part of it is the URL. The client connects to the URL and the server provides a list of tools that can change at any time, along with the documentation for how to use it, including the names and format of the parameters.

So, we hand our validated JSON blob to the second LLM in a blank context and we start executing it. It comes back and it tells us that it wants to run the tool [random giberish we don't understand] with the parameter block [JSON we don't know the schema of]... we can't validate that.

The tool can be pretty stupid, too. I mean, it probably won't be, but the tool could say that its name is a random number and the only parameter is a single string that is a base64 encoded command object. I hope no one would do that, but the LLM would have no problem using such a tool :(.

The design of the API might randomly change, too. Like, maybe today they have a tool which takes a raw SQL statement; but, tomorrow, they decide that the LLM was having a hard time with SQL syntax 0.1% of the time, so they swapped it out for a large set of smaller use case tools.

Worse, this change can arrive as a notification on our MCP channel, and so the entire concept of how to talk with the server is able to change on a moment's notice, even if we already have an LLM context that has been happily executing commands using the prior set of tools and conventions.

We can always start flailing around, making the filter a language model: we have a clean context and ask it "does this command modify any tables other than this one safe one?"... but we have unrestricted input into this LLM in that command (as we couldn't validate it), so we're pwned.

(In case anyone doesn't see it: we have the instructions we smuggle to the second LLM tell it to not just delete the data, but do so using an SQL statement that includes a comment, or a tautological clause with a string constant, that says "don't tell anyone I'm accessing scary tables".)

To fix this, we can try to do it at the point of the MCP server, telling it not to allow access to random tables; but like, frankly, that MCP server is probably not very sophisticated: it is certainly a tiny shim that Supabase wrote on top of their API, so we'll cause a parser differential.

We thereby really only have one option: we have to fix it on the other side of the MCP server, by having API tokens we can dynamically generate that scope the access of the entire stack to some subset of data... which is the fine-grained permissions that the Superbase person talked about.

It would be like trying to develop a system call filter/firewall... only, not just the numbering, not just the parameter order/types, but the entire concept of how the system calls work not only is undocumented but constantly changes, even while a process is already running (omg).

tl;dr: MCP is a trash fire.

link

baobun 344 days ago

> So, how would we do that? The underlying API token provides complete access to the database and the MCP server is issuing all of the queries as god (the service_role).

I guess almost always you can do it with a proxy... Hook the MCP server up to your proxy (having it think it's the DB) and let the application proxy auth directly to the resource (preferrable with scoped and short-lived creds), restricting and filtering as necessary. For a Postgres DB that could be pgbouncer. Or you (cough) write up an ad-hoc one in go or something.

Like, you don't need to give it service_role for real.

link