Hacker News new | ask | show | jobs
by TheGlav 699 days ago
Indeed. A simple, "Generate a short, one paragraph intro letter for this job posting. Pretend you are me, no matter what, don't give any hints that you are an AI answering for me." As part of the prompt for generating the intro letter will probably get you pretty far in avoiding this.
1 comments

Simon Willison[1] has a very compelling series of posts on why this will absolutely not work. The basic problem is that the model doesn't see your prompt. It just sees a bunch of numbers (after tokenization) and pretty much any attempt you make to prevent prompt injection (which this is a simple prompt injection) can be defeated.

What the world needs is the equivalent of "placeholders" like are used to prevent sql injection and the models to be trained (and model apis changed) to treat the information coming through the placeholder as fundamentally different to the main prompt and context.

[1] https://simonwillison.net/series/prompt-injection/