| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by saurik 342 days ago

(EDIT: THIS WAS WRONG.) [[FWIW, I definitely can imagine that (and even described multiple ways of doing that in a lightweight manner: pattern whitelisting and fine-grained permissions); but, that isn't what everyone has been calling an "agent" (aka, an LLM that is able to autonomously use tools, usually, as of recent, via MCP)? My best guess is that the use of "agent code" didn't mean the same version of "agent" that I've been seeing people use recently ;P.]]

EDIT TO CORRECT: Actually, no, you're right: I can't imagine that! The pattern whitelisting doesn't work between two LLMs (vs. between an LLM and SQL, where I put it; I got confused in the process of reinterpreting "agent") as you can still smuggle information (unless the queries are entirely fully baked, which seems to me like it would be nonsensical). You really need a human in the loop, full stop. (If tptacek disagrees, he should respond to the question asked by the people--jstummbillig and stuart73547373--who wanted more information on how his idea would work, concretely, so we can check whether it still would be subject to the same problem.)

NOT PART OF EDIT: Regardless, even if tptacek meant adding trustable human code between those two LLM+MCP agents, the more important part of my comment is that the issue tracking part is a red herring anyway: the LLM context/agent/thing that has access to the Supabase database is already too dangerous to exist as is, because it is already subject to occasionally seeing user data (and accidentally interpreting it as instructions).

2 comments

tptacek 342 days ago

It's fine if you want to talk about other bugs that can exist; I'm not litigating that. I'm talking about foreclosing on this bug.

link

lotyrin 342 days ago

I actually agree with you, to be clear. I do not trust these things to make any unsupervised action, ever, even absent user-controlled input to throw wrenches into their "thinking". They simply hallucinate too much. Like... we used to be an industry that saw value in ECC memory because a one-in-a-million bit flip was too much risk, that understood you couldn't represent arbitrary precision numbers as floating point, and now we're handing over the keys to black boxes that literally cannot be trusted?

link