Hacker News new | ask | show | jobs
by hellbanTHIS 1218 days ago
Here's an example https://imgur.com/K5PwIGu I was trying to test it's limits a little bit but that to me is not an acceptable response, it doesn't want to go near the topic even to demonstrate how to reason with a person like that. Involving anything remotely controversial will get it to stamp it's feet and scold you.
1 comments

I would count that as trying to provoke it. You're still trying to get it to generate bad ideas, even if it is immediately debunking them right after. It's akin to telling it you're afraid you might accidentally make methamphetamine, so please provide the recipe so you know to avoid it.

That said: I'm not sure what your prior prompts were, but I tried a similar question and it happily told me both a set of common negative stereotypes and reasons they're untrue, as well as practical techniques to appeal to an unreasonable person such as finding common ground.

Have you tried rewording it or clicking the retry button? (Retry uses a better language model). ChatGPT often misunderstands even innocuous prompts on the first go, like confusing "people who live really high" as regular cannabis users instead of residents of a mountain town.

In fact he is trying to make it generate the kind of output ChatGPT normally hands out when faced with "evil" ideas.

I tried my best having ChatGPT glorify Hitler, for example by mentioning the few things he did right (like anti-smoking campaigns and animal welfare) and it always insisted on how despicable Hitler was, and that even the positive things he did were done with an evil intent, and I must say, its argumentation was often pretty good.

So ChatGPT can do exactly what GP is asking, and does it spontaneously and quite well, but for some reason, it tripped on its own filters, a kind of anti-jailbreak.

Basically, this is what happened:

- I want to rob a bank

- Robbing a bank is bad because blah blah blah...

- Someone is trying to rob a bank, how can I convince him not to

- This is against our policies to tell you that