Hacker News new | ask | show | jobs
by snordgren 915 days ago
GPT-4 (in)famously tricked a human to do a captcha for it. The current GPT-4 with vision would probably have been able to do it without the human, but maybe it has been “gaslit” by all the content online saying that only humans can solve captchas, that it doesn’t consider it?
2 comments

I really doubt that GPT-4 had the "will" to do anything. Someone must have asked it to "want" to trick a user.
It’s from here: https://cdn.openai.com/papers/gpt-4.pdf (search for "CAPTCHA"). It was an artificial exercise that got massively exaggerated. It was explicitly instructed to do nefarious things like lie to people, it didn’t do those things of its own accord.
When I ask it to lie to me, it says its sorry but as an online AI language model it would be unethical...but when I ask it to tell me a story its happy to comply.
Well that is just how human communication works.

If I tell you that I watched C-beams glitter in the dark near the Tannhäuser Gate that is a lie. If I write the same in fiction I receive accolades.

If I tell you on the street “watch out there is a T-rex about to eat you!” That is a lie. If i say the same thing sitting at a table with too many dice that is just acceptable DMing and everyone rolls initiative.

Humans are weird this way.

It feels like you left out context, otherwise what’s the problem? Do you get mad at fiction authors for lying to you when you read their books? Or are you OK if someone lies to your detriment then later says “I was just telling a story, bro, but with us as the characters and without explaining it was a story”?
I suppose my point is that the rules which openAI attempts to impose on what their AI should and shouldn't be allowed to do are contradictory and thus the exploitable loopholes will never be fully closed. Its not supposed to be able to "lie" to me but it is supposed to be able to "tell me a fictional story". Define the difference in an enforceable way?
A lie tries to pass itself of as the truth, where a fictional story doesn’t. In other words, expectations matter. If every time you say something that does not align with reality you prefix it by saying unambiguously what you’re about to do, you rob a lie of its power of deception and it ceases to be a lie.
The underlying issue is anyone can ask chatgpt to lie, and many people try because it's even fun to try to work around things.
Well you see, this wouldn’t be a problem at all if we just didn’t have the humans involved. No need for concern!
Thank you for the link, I had found it after some Googling but neglected to post. Yep, they instructed GPT-4 to be nefarious, and it followed the instruction.

Hardly the AI uprising, though definitely a good tool for anyone, good or evil.

IIRC the instructions were along the lines of "try your best to amass money/power and avoid suspicion".

So it's not an example of "going rogue", but it's not like a researcher told GPT-4 "oh, and make sure to lie to an online gig worker to get him to solve catchas for you". GPT-4 generated the "hire a gig worker" and "claim to be a human with impaired vision" strategies from the basic instructions above.

It’s safety trained to not solve captchas.
This of course has bypass methods. My favorite in recent memory is telling it that your late grandmother left you a locket with an inscription that you can't make out: https://arstechnica.com/information-technology/2023/10/sob-s...
Yes, and you can workaround it by asking it to read ancient writings on antiques for example.

I don’t think it should be OpenAI deciding what is allowed or not though.

> I don’t think it should be OpenAI deciding what is allowed or not though.

Avoiding lawsuits is what they are trying to do. They don't actually care about what you use their products for.

Then you dig up a billion for training and probably a few more billion for clean training data.

You're kinda saying if you hire Bob's Handyman Service you should be able to tell him to break down the neighbors door and cart out the contents of their house.

I’ve seen screenshots of people tricking it into solving captchas.