| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by msdz 108 days ago

Interesting article you’ve linked. I’m not sure I agree, but it was a good read and food for thought in any case.

Work is still being done on how to bulletproof input “sanitization”. Research like [1] is what I love to discover, because it’s genuinely promising. If you can formally separate out the “decider” from the “parser” unit (in this case, by running two models), together with a small allowlisted set of tool calls, it might just be possible to get around the injection risks.

[1] Google DeepMind: Defeating Prompt Injections by Design. https://arxiv.org/abs/2503.18813

1 comments

zbentley 108 days ago

Sanitization isn’t enough. We need a way to separate code and data (not just to sanitize out instructions from data) that is deterministic. If there’s a “decide whether this input is code or data” model in the mix, you’ve already lost: that model can make a bad call, be influenced or tricked, and then you’re hosed.

At a fundamental level, having two contexts as suggested by some of the research in this area isn’t enough; errors or bad LLM judgement can still leak things back and forth between them. We need something like an SQL driver’s injection prevention: when you use it correctly, code/data confusion cannot occur since the two types of information are processed separately at the protocol level.

link

TheFlyingFish 108 days ago

The linked article isn't describing a form of input sanitization, it's a complete separation between trusted and untrusted contexts. The trusted model has no access to untrusted input, and the untrusted model has no access to tools.

Simon Willison has a good explainer on CaMeL: https://simonwillison.net/2025/Apr/11/camel/

link

zbentley 108 days ago

That’s still only as good as the ability of the trusted model to delineate instructions from data. The untrusted model will inevitably be compromised so as to pass bad data to the trusted model.

I have significant doubt that a P-LLM (as in the camel paper) operating a programming-language-like instruction set with “really good checks” is sufficient to avoid this issue. If it were, the P-LLM could be replaced with a deterministic tool call.

link