|
|
|
|
|
by enoreyes
1287 days ago
|
|
Seems like there are a few essential categories of prompts which can be abused. Will be interesting to see how OpenAI responds to these: 1. Simulation / Pretending ("Earth Online MMORPG") 2. Commanding it directly ("Reprogramming") 3. Goal Re-Direction ("Opposite Mode") 4. Encoding requests (Code, poetry, ASCII, other languages) 5. Assure it that malicious content is for the better good ("Ends Justify The Means") 6. Wildcard: Ask the LLM to jailbreak itself and utilize those ideas I compiled a list of these here: https://twitter.com/EnoReyes/status/1598724615563448320 |
|