| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by glow8 1176 days ago
	A post here recently showcased a website/game where you try to jailbreak the AI in multiple ways. Your post processing strategy would fail if, e. g., you ask it to encrypt the output by repeating every word twice. It's impossible to fully prevent this from happening.

2 comments

derefr 1176 days ago

> Your post processing strategy would fail if, e. g., you ask it to encrypt the output by repeating every word twice. It's impossible to fully prevent this from happening.

It’s not “impossible”, just NP-hard. You “just” have to prove a structural equivalence (graph isomorphism) between the output and your ruleset.

link

mdaniel 1176 days ago

the post in question: https://news.ycombinator.com/item?id=35905876 (they have allegedly fixed the 429s but I'd have to start over because I closed my browser so I don't know if they're fixed or not)

link