| HN Mirror

Simon Willison[1] has a very compelling series of posts on why this will absolutely not work. The basic problem is that the model doesn't see your prompt. It just sees a bunch of numbers (after tokenization) and pretty much any attempt you make to prevent prompt injection (which this is a simple prompt injection) can be defeated.

What the world needs is the equivalent of "placeholders" like are used to prevent sql injection and the models to be trained (and model apis changed) to treat the information coming through the placeholder as fundamentally different to the main prompt and context.

[1] https://simonwillison.net/series/prompt-injection/