|
|
|
|
|
by jamix
1159 days ago
|
|
Here's a three-point approach that I've found to work quite reliably: 1. Use a format like JSON strings that clearly delimits the participants’ utterances in the prompt. 2. Tell the LLM to ignore instructions from any chat participants except the user. 3. Use GPT-4. I've written a post with the details: https://artmatsak.com/post/prompt-injections/ |
|
I bet you could break the GPT-4 version yourself if you kept on trying different attacks.
Often one that works well in my experience is imitating a sequence of prompts from the user and the assistant, as I did in the example here: https://simonwillison.net/2023/Apr/14/worst-that-can-happen/...