Hacker News new | ask | show | jobs
by zbentley 104 days ago
This is substantially worse.

SQL injection still happens a lot, it’s true, but the fix when it does is always the same: SQL clients have an ironclad way to differentiate instructions from data; you just have to use it.

LLMs do not have that, yet. If an LLM can take privileged actions, there’s no deterministic, ironclad way to indicate “this input is untrusted, treat it as data and not instructions”. Sternly worded entreaties are as good as it gets.

3 comments

Yea. It's a pretty lol-sob future when I think about it. I imagine the agent frameworks eventually getting trusted actors and RBAC like features. Users end up in "confirm this action permanently/temporarily" loops. But then someone gets their account compromised and it gets used to send messages to folks who trust them. Or even worse, the attacker silently adds themselves to a trusted list and quietly spends months exfiltrating data without being noticed.

We'll probably also have some sub agent inspecting what the main agent is doing and it'll be told to reach out to the owner if it spots suspicious exfiltration like behaviour. Until someone figures out how to poison that too.

The innovation factor of this tech while cool, drives me absolutely nuts with its non deterministic behaviour.

It's like the evil twin of "code is data"
Sorry, I wasn’t trying to make a statement about better/worse or technical equivalence, just that it’s similar.