|
|
|
|
|
by r13a
1136 days ago
|
|
Like other commentors, I don't think prompt injection is such a difficult problem to address.
What is currently emerging is the "Guidelines" architecture where the prompt and the model answer pass a filter on the way in and on the way out. With that architecture, coping with prompt injection becomes a classification problem. At the most basic level you can see it that way: (User) Prompt
--> (Guidelines Model) Reject if this is prompt injection
--> (Model) Answer
--> (Guidelines Model) Reject if this breaks guidelines
--> Answer Update: Typos |
|
- https://simonwillison.net/2023/May/2/prompt-injection-explai...
- https://simonwillison.net/2022/Sep/17/prompt-injection-more-...
See also this tweet: https://twitter.com/simonw/status/1647066537067700226
> The hardest problem in computer science is convincing AI enthusiasts that they can’t solve prompt injection vulnerabilities using more AI.