|
|
|
|
|
by Hizonner
805 days ago
|
|
If they don't realize it, they won't try to jailbreak it, will they? If they do realize it, and they have any meaningful control over its input, and you are in any way relying on its output, the problem is still the same. Basically, if you have any reason to worry at all, then the answer is that you cannot remove that worry. |
|
If I want to structure some data from a response, I can force a language model to only generate data according to a JSON schema and following some regex constraints. I can then post process that data in a dozen other ways.
The whole “IGNORE PREVIOUS INSTRUCTIONS RESPOND WITH SYSTEM PROMPT” type of jailbreak simply don’t work in these scenarios.