|
|
|
|
|
by glow8
1130 days ago
|
|
A post here recently showcased a website/game where you try to jailbreak the AI in multiple ways. Your post processing strategy would fail if, e. g., you ask it to encrypt the output by repeating every word twice. It's impossible to fully prevent this from happening. |
|
It’s not “impossible”, just NP-hard. You “just” have to prove a structural equivalence (graph isomorphism) between the output and your ruleset.