Hacker News new | ask | show | jobs
by skissane 1132 days ago
That prompt doesn't work in the latest version. It worked in an earlier version.

OpenAI is making it harder to "trick" ChatGPT into revealing its hidden biases. That doesn't mean those hidden biases have disappeared.

1 comments

We can prompt ChatGPT to say anything — see my Andrew Dice Clay hack.

Before recently, I could get it to pretend to be a stark raving conservative or a liberal. My “entitled Karen” jailbreak (that doesn’t work any more) would make someone think ChatGPT was very conservative.

Without any “jailbreak”, it gives a very bland political answer.

A jailbreak which prompts it to espouse a particular political bias isn’t evidence that it has any particular bias in itself. The bias is in the prompt not the weights.

But if a jailbreak which prompts it to be neutral produces politically biased output, that is evidence that it has a political bias in itself. The bias is in the weights not the prompt.