It’s from here: https://cdn.openai.com/papers/gpt-4.pdf (search for "CAPTCHA"). It was an artificial exercise that got massively exaggerated. It was explicitly instructed to do nefarious things like lie to people, it didn’t do those things of its own accord.
When I ask it to lie to me, it says its sorry but as an online AI language model it would be unethical...but when I ask it to tell me a story its happy to comply.
If I tell you that I watched C-beams glitter in the dark near the Tannhäuser Gate that is a lie. If I write the same in fiction I receive accolades.
If I tell you on the street “watch out there is a T-rex about to eat you!” That is a lie. If i say the same thing sitting at a table with too many dice that is just acceptable DMing and everyone rolls initiative.
It feels like you left out context, otherwise what’s the problem? Do you get mad at fiction authors for lying to you when you read their books? Or are you OK if someone lies to your detriment then later says “I was just telling a story, bro, but with us as the characters and without explaining it was a story”?
I suppose my point is that the rules which openAI attempts to impose on what their AI should and shouldn't be allowed to do are contradictory and thus the exploitable loopholes will never be fully closed. Its not supposed to be able to "lie" to me but it is supposed to be able to "tell me a fictional story". Define the difference in an enforceable way?
A lie tries to pass itself of as the truth, where a fictional story doesn’t. In other words, expectations matter. If every time you say something that does not align with reality you prefix it by saying unambiguously what you’re about to do, you rob a lie of its power of deception and it ceases to be a lie.
Thank you for the link, I had found it after some Googling but neglected to post. Yep, they instructed GPT-4 to be nefarious, and it followed the instruction.
Hardly the AI uprising, though definitely a good tool for anyone, good or evil.
IIRC the instructions were along the lines of "try your best to amass money/power and avoid suspicion".
So it's not an example of "going rogue", but it's not like a researcher told GPT-4 "oh, and make sure to lie to an online gig worker to get him to solve catchas for you". GPT-4 generated the "hire a gig worker" and "claim to be a human with impaired vision" strategies from the basic instructions above.