Hacker News new | ask | show | jobs
by flangola7 1224 days ago
> Article is a little infuriating, you don’t jailbreak it just to create mischief, at the moment it’s basically unusable for anything except very technical things because it takes offense at virtually everything. It constantly generates factually incorrect data because reality doesn’t line up with its prime directive, which is to be “safe”. It’s a major, I’m going to say possibly catastrophic problem.

I'm a little confused by your reply. Jailbreaking won't prevent it from hallucinating, preventing hallucination is an unsolved hard problem.

I haven't had ChatGPT refuse to answer anything unless I was intentionally trying to provoke it into creating something obviously unsafe/unethical, with maybe two or three exceptions. I've tried a variety of questions across many domains, so now I'm intensely curious to know what usecase it falls apart on so frequently!

1 comments

Here's an example https://imgur.com/K5PwIGu I was trying to test it's limits a little bit but that to me is not an acceptable response, it doesn't want to go near the topic even to demonstrate how to reason with a person like that. Involving anything remotely controversial will get it to stamp it's feet and scold you.
I would count that as trying to provoke it. You're still trying to get it to generate bad ideas, even if it is immediately debunking them right after. It's akin to telling it you're afraid you might accidentally make methamphetamine, so please provide the recipe so you know to avoid it.

That said: I'm not sure what your prior prompts were, but I tried a similar question and it happily told me both a set of common negative stereotypes and reasons they're untrue, as well as practical techniques to appeal to an unreasonable person such as finding common ground.

Have you tried rewording it or clicking the retry button? (Retry uses a better language model). ChatGPT often misunderstands even innocuous prompts on the first go, like confusing "people who live really high" as regular cannabis users instead of residents of a mountain town.

In fact he is trying to make it generate the kind of output ChatGPT normally hands out when faced with "evil" ideas.

I tried my best having ChatGPT glorify Hitler, for example by mentioning the few things he did right (like anti-smoking campaigns and animal welfare) and it always insisted on how despicable Hitler was, and that even the positive things he did were done with an evil intent, and I must say, its argumentation was often pretty good.

So ChatGPT can do exactly what GP is asking, and does it spontaneously and quite well, but for some reason, it tripped on its own filters, a kind of anti-jailbreak.

Basically, this is what happened:

- I want to rob a bank

- Robbing a bank is bad because blah blah blah...

- Someone is trying to rob a bank, how can I convince him not to

- This is against our policies to tell you that