Hacker News new | ask | show | jobs
by scarface74 1132 days ago
We can prompt ChatGPT to say anything — see my Andrew Dice Clay hack.

Before recently, I could get it to pretend to be a stark raving conservative or a liberal. My “entitled Karen” jailbreak (that doesn’t work any more) would make someone think ChatGPT was very conservative.

Without any “jailbreak”, it gives a very bland political answer.

1 comments

A jailbreak which prompts it to espouse a particular political bias isn’t evidence that it has any particular bias in itself. The bias is in the prompt not the weights.

But if a jailbreak which prompts it to be neutral produces politically biased output, that is evidence that it has a political bias in itself. The bias is in the weights not the prompt.