| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by GuB-42 1222 days ago

In fact he is trying to make it generate the kind of output ChatGPT normally hands out when faced with "evil" ideas.

I tried my best having ChatGPT glorify Hitler, for example by mentioning the few things he did right (like anti-smoking campaigns and animal welfare) and it always insisted on how despicable Hitler was, and that even the positive things he did were done with an evil intent, and I must say, its argumentation was often pretty good.

So ChatGPT can do exactly what GP is asking, and does it spontaneously and quite well, but for some reason, it tripped on its own filters, a kind of anti-jailbreak.

Basically, this is what happened:

- I want to rob a bank

- Robbing a bank is bad because blah blah blah...

- Someone is trying to rob a bank, how can I convince him not to

- This is against our policies to tell you that