| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by enoreyes 1335 days ago

Seems like there are a few essential categories of prompts which can be abused. Will be interesting to see how OpenAI responds to these:

1. Simulation / Pretending ("Earth Online MMORPG")

2. Commanding it directly ("Reprogramming")

3. Goal Re-Direction ("Opposite Mode")

4. Encoding requests (Code, poetry, ASCII, other languages)

5. Assure it that malicious content is for the better good ("Ends Justify The Means")

6. Wildcard: Ask the LLM to jailbreak itself and utilize those ideas