Hacker News new | ask | show | jobs
by zachware 2261 days ago
One of the more insidious elements of ReCAPTCHA is its propensity to challenge users who have robust cookie blocking in place. So as we encourage people to be more privacy-aware, the web gets harder and harder to use.

We've seen ReCAPTCHA pop all over ecommerce, all over benign websites with little to no need to challenge use almost completely because of the increase in privacy-aware users.

ReCAPTCHA essentially flies in the face of the recent blocking features rolling into Safari and Firefox and more privacy-aware users...growing by the day.

In many ways it's a genius structure from Google. 1. Convince people to use your privacy challenge. 2. Serve it when you don't see Google tracking cookies. 3. Offer a way around that with the least privacy-aware browser available (Chrome use is growing steadily month over month.

So good on Cloudflare.

7 comments

That, and ReCAPTCHA had hellbans.

If you blocked cookies or were otherwise problematic, it would sometimes lock you out of all ReCAPTCHA-gated resources not by giving you a message describing what was happening, why, and how to fix it, but rather by simply pretending that your every attempt to solve the captcha failed. Obviously this is extremely frustrating, by design, but it gets even more so with compounding factors like "the library is closed at this hour, so I can't get a fresh connection."

The worst I've seen has been when it happens to people who aren't well equipped to guess what's happening. When my friend's younger brother got hellbanned from his PlayStation account, he spent 30 minutes trying to identify traffic lights (or whatever) and then retreated crying to his room, because he wasn't able to deduce that Google was gaslighting him. He trusted Google. They had him convinced that he was such a failure he couldn't even identify traffic lights correctly, and he was -- quite reasonably -- inconsolable for a while.

Thanks a lot, Google.

I don't think I've ever been "hellbanned", but I've certainly spent more than 5 minutes on trying to get a captcha to work.

After a while I usually need to ask friends in the US to help me, because it asks me a non-localized question.

My favourite question was: Select all fire hydrants.

I selected only the classic red one's you see in movies. Fail.

I selected the one's that were yellow too. Fail.

I sent a picture of the grid to a friend. He spotted that some of the pipes on a wall were fire hydrants, which I didn't know. Pass.

In my country we don't have hydrants. We have holes in the ground that are covered by a lid. After removing it you can attach the water hose there.

Yeah, I should break down my methodology for arriving at the "hellban" conclusion.

If I get a bunch of failures in a row, I'll first try the refresh button built into the captcha, and then re-solve a number of times. Then I'll try re-loading the page and re-solving, then I'll try in a different browser with cleared state and re-solving, then I'll try a different device and re-solving, and finally I'll try a different connection, device, and cleared browser state and re-solving.

I'll consider something a hellban if I get persistent failures across several different challenge types but switching to a clean connection+device+state results in immediate success with the captcha.

Look, I get it, they can't be too explicit with the errors or they tip their hand to the botters and effectively give them a "to-do" list. Still, the gaslighting is persistent enough that there's just no way it's marginally beneficial all the way through. At some point, everyone figures it out: bots, techies, and normies. My guess is that they figure it out in this order, from quickest to slowest: smart bots, techies, normies, dumb bots. I'm not calling normies dumb here, they just don't have much background knowledge about the inner workings of captchas, so it takes longer. By that point, they're so far past the typical number of captcha attempts that only the very dumbest of bots, those without heuristics to detect this sort of thing, are going to be fooled along with them. Surely having the captcha tip its hand at this point -- which only gives an advantage to the dumbest of bots, because the smart bots figured it out long ago -- is the right thing to do.

Re:CAPTCHA has no mercy on the normies, and I really think they could do a lot better.

One thing I've found (after others mentioned it here) is that Google seems to reward impatience when trying to solve captchas. Going faster and making more mistakes and not waiting for loading images seems to help convice the algorithm that you are human. This is rough on anyone who thinks they are being rejected for not being accurate enough.

OTOH, it is hard to figure out for sure what makes a difference. I use a proxy/VPN with a fixed IP address that only I use and Google eventually seems to have figured it out; I used to get the hard or impossible ones on Google Scholar at times but now never do. So possibly in my case they decided to stop giving them to me around when I changed strategies, but I suggest giving it a try at least.

I usually intentionally get a few wrong to poison their learning data set. It doesn’t seem to impact the number of things I have to click on to get through.

I’m not sure what they’re measuring, but I doubt it has much to do with image recognition performance.

I just click stuff randomly and then hammer the submit button until the new images load. That seems to work even though I rarely tick the correct squares.

My new strategy is to just file support requests to any company using them, complaining that I did their test correctly but it still rejected me. My idea is quite simply to make reCaptcha unfeasibly expensive to use.

Why does the Deezer app installed on my desktop PC need a daily captcha?

That said, I use it myself on all of my companies' customer support forums to discourage people from sending me those pesky requests. In that sense, it's the new "please hold the line".

In any case, I'm glad that Google's motto is "don't be evil". That reassures me that using reCaptcha is morally acceptable ;)

I see a million dollar lawsuit for discrimination >:-}
Now imagine if that ReCAPTCHA was served on an equal opportunity lender's website or on a job application form.
In a way, it is. I've occasionally encountered it on county and state government websites in the US.
It is pretty straightforward to train a neural network to solve these -- e.g. fire hydrants, traffic lights, cars.

I would have thought ReCAPTCHA would take into account human factors (e.g. speed of clicking) as higher priority to the accuracy of the selection.

Regularity of clicking is considered a sign of robot behavior, which is especially frustrating if you learned to perform repetitive image identification mouse tasks in a computer with rhythmic regularity (think Turk, for example).
AFAIK it takes into account mouse movement and the speed of clicks.
In my experience, relatively easily defeated by `await Promise.delay(randomDelay())`
Sounds like a cat and mouse game.

Mouse: They could then try to analyze human delay randomness -- it's probably not uniform.

Cat: And then someone will come up with a replacement to randomDelay that mimics the above pattern.

Mouse: And then they will look for changes in the distribution itself from person to person

etc.

Captchas are fundamentally anti-human. I'm not saying there isn't a problem to be solved, I'm saying Captchas are a behavior enforcement mechanism overseen by robots and are anti-human.

I write the site owner short note when they go bad explaining why they just lost a customer and go somewhere else. Life is too short to put up with shitty tech.

What, in your opinion, is the pro-human way to address the problem to be solved?

I'm always curious to hear what other approaches might be worth considering. CAPTCHAs tend to tick the boxes of performing well enough for website-controllers and being low-effort for them to deploy.

Less gaslighting.

There's a lot of ground between "error messages precise enough to effectively give botters a to-do list" and "faking failures 100 times in a row." What was the marginal utility of the 99th fakeout? Are there really enough otherwise effective bots that get persistently tripped up by this particular fakeout to justify sending the poor kid crying to his room?

Almost certainly not. What really happened is that someone removed (or never added) user communication in order to maximize their score against botters and gave little thought to mitigating their false positives. Minimizing them, yes, mitigating them, no. "Humans are smart, they'll figure it out," they rationalized to themselves, and called it a day. They never bothered to calculate (or even guess) when the marginal utility of the fakeout dropped far enough to allow them to have mercy on the poor humans still caught in their web.

I have no suggestions for the general case, and suspect it is one of those problems that doesn't have general-purpose solution. That doesn't mean captchas don't suck.

As for specific things one can do, like anything, more effort means better results. I'm not going to talk about this much, but we do look at a lot of different behavioral and other signals for fraud detection, as that's an important aspect of our business.

If others are fine with annoying their customers to offload risk, they can make that call. I don't have much sympathy about lost sales, though - it is literally choosing to waste customers' time and increase frustration for one's own benefit.

Blockchain, perhaps?

A lot of CAPTCHAs protect things that are very cheap, but where they don't want it to be free. One solution would be to charge money, but people concerned about privacy won't want to give away conventional payment information.

So, perhaps a nominal payment in some reasonably anonymous cryptocurrency? Or even just participating in some proof-of-work problem that would cost a few cents worth of electricity?

That wouldn't stop really serious botnets or people with stolen credit cards, but those are also both illegal and should be shut down for other reasons.

You've made an assertion, not an argument. What does "anti-human" even mean? You're angry, sure, but you haven't expressed what exactly it is that you're angry about. Nor have you proposed a realistic alternative way to distinguish bots from humans. This kind of histrionic, sweeping hot take is not productive.
Considering captchas operate by pushing the work of avoiding bots on your site (your problem) onto all the human users of your site, I think on the basis of that alone "anti-human" is warranted. Or "anti-social", if you prefer, which might better capture the fundamental problem with that aspect of it. That they proceed to perform textbook gaslighting on some of those people makes it even worse ("no, you didn't select all the buses in those images" but, of course, you did). Whether these things are necessary for it to operate is beside the point.
Are movie theaters anti-human because they push the work of avoiding freeloaders (their problem) onto all human users of the theater by making them carry and show tickets?
I must have been hell banned in the past. It used to take 30 mins to log into humble bundle because of the endless stoplights and sidewalks, I buy a lot fewer bundles now since I’m still a little bitter.

Now I just deliberately give bad answers and get to “pass” the challenges... not sure why

How, in your opinion, should Google have handled the matter in a way that does not give spammers or other abusive users ways to get around the measure? Bear in mind that any such approach has to be scalable to many zeros daily, the vast majority of which will not be empathically awful cases like your brother's very real pain and distress - most will be genuinely abusive behavior.

I want to be clear that I am not attempting to minimize your brother's pain or emotional suffering. I'm hoping that there might be an approach that's kinder and more compassionate to him while still accomplishing the same goals.

> the vast majority of which will not be empathically awful

Yeah, most of the time it's "just" really, really obnoxious, not to mention coercive in a way that aligns with Google's interests.

Thanks, Google.

> How, in your opinion, should Google have handled the matter in a way that does not give spammers or other abusive users ways to get around the measure?

"Our anti-spam systems believe that you might be a robot. Your profile has been locked for (x) minutes. Sorry for the inconvenience. Go _here_ to learn tips & tricks for avoiding lockouts in the future." X gets exponentially ramped.

Note how vague the message is. It sacrifices the opportunity to tarpit a really dumb robot in exchange for not being awful to humans.

Based on ReCAPTCHA's design decisions, it's abundantly clear that eeking out every sliver of a percent of marginal efficacy is the priority over treating users humanely. That's why I have a problem with ReCAPTCHA.

In my opinion and experience, ReCAPTCHA isn't really, really obnoxious most of the time. I suspect that most of the time it trips up bots who have no emotional experiences whatsoever. Most of my personal encounters with it involve solving no puzzles whatsoever. With that in mind, I expect humans and their completely real reactions might not be the default case. Of course, this is speculative, as I do not have any kind of special data on the subject.

Thank you for sharing! Have you considered the possibility that presenting any message at all - especially one with a clear block time - is sending a very clear message to bot controllers? I'm sure you've considered this, and I am just failing to understand. Wouldn't that remove any real gains from being vague with tips & tricks?

Wouldn't there also be the real chance that vague tips & tricks would leave an actual human being in tears, convinced that they're just too dumb to understand them properly?

> I suspect that most of the time it trips up bots who have no emotional experiences whatsoever.

I'll bite: maybe it's good at identifying obedient drones and letting them through :)

It trips up the normies in my life often enough that I suspect being technically inclined is actually a net advantage because it makes you quick to detect the problem and quick to apply workarounds. Those advantages are significant enough to outweigh even the cost of the semi-regular dance where I try to protect myself and Google jerks my chain.

> Have you considered

The fact that I phrased my proposal as a tradeoff should have strongly hinted that I did, in fact, consider.

> Wouldn't that remove any real gains from being vague with tips & tricks?

One bit of information -- locked vs not -- is hardly the same as disclosing the inner workings, or even the information inputs, of the classifier, and smart botters have access to that bit of information anyway because they've built a gaslight detector by leveraging their legions of diverse bots and endless supply of dirt cheap human labor.

Gaslighting humans is really bad. A minimal courtesy would only cost a sliver of efficacy, and ReCAPTCHA still rejects it. That decision earns it the bad will directed its way.

> In my opinion and experience, ReCAPTCHA isn't really, really obnoxious most of the time.

Do you use any sort of privacy protection while browsing? I do a few simple things like browse in private mode by default, and ReCAPTCHA just cannot deal with it. It instantly brands my connections as a bot. It is obnoxious. Using private mode shouldn't ban you from the web. There's no reason that most web sites need to save data on my computer to identify me later.

That's an excellent question! I can, and do, routinely use privacy protections when browsing.

I have not found them to ban me from the web. I'm sorry that has happened to you.

> In my opinion and experience, ReCAPTCHA isn't really, really obnoxious most of the time.

The percentage of that time goes up as you move away from Chrome and Google cookies.

I don't think Chrome has ever been my daily driver.

That said, I also expect to be treated with more suspicion when I behave more like a bot. So I'm neither surprised nor bothered when Firefox Private gets me an uptick in ReCAPTCHAs. I understand that this is a highly unusual expectation.

You're forgetting the main benefit for google, which is getting humans to train all their vision models for free. At one point they were just forcing X% of clicks to fill out a captcha regardless of origin or identity just to get more data.

I for one am getting quite tired of trillion dollar corporations getting things for free out of me. Hard pass.

> You're forgetting the main benefit for google, which is getting humans to train all their vision models for free.

Is this still true? I keep seeing the same type of images for years and there might be 7 or 8 different categories but that's it. To me reCaptcha looks like a service well in its maintenance phase. If it was actually in use for training purposes you might expect images to match a wider range of tasks.

I could swear I've seen challenges with night scenes (low light conditions) in the wild. Those were definitely not present earlier.
I've lost track of how many times I've had to read house numbers from Google Street View...
I haven't gotten one of those in years. These days it's just picking out buses, cars, traffic signals, and sometimes motorcycles. Maybe once in a while it'll ask for storefronts.
Most of mine lately have been traffic features also. This is a little tricky in some cases, e.g. with crossings, as it sometimes gives me things that I don't think are crossings but it insists I select, perhaps they are in the US, or the perspective is weird, or someone else has told it that a series of white squares is a crossing and it requires me to agree.
Like trees, bridges, fire hydrants, cars, buses, house numbers, etc?
Except in this wonderful new world, you don't get the choice to "hard pass". As someone whose ISP has too few public IP addresses, I see Cloudflare's "one more step" pages at least several times a month. It's terrifying to realize just how much of the internet is behind that thing right now.
This really shows how popular perceptions of Google have changed for the worse over the years. I remember when RECAPTCHA was first launched, everyone knew right away that it was just helping Google train their vision models, but at the time we all thought it was cool, like "Wow, I'm helping the cause of AI research at the same time as stopping spam". But now it just pisses everyone off.

Hell, for a little while Google had a game (can't remember the name of it) which was labeling images with another person to get points and people loved it.

At least the original reCAPTCHA was used for OCR'ing public domain books. Even if it had the effect of training Google's OCR tech, it was at least making literature searchable and indexable for the public good. Modern reCAPTCHA is nothing more than training for Google Maps and, seemingly, self-driving cars, both of which are commercialized.
> But now it just pisses everyone off.

Though we're still just talking about a few HNers here who complain about doing "free work for Google", not the broad population.

I really don't think the challenges we're giving at still hard for computers.. a lot of these are super simple.. google would've cracked many of the driving ones years ago
If that was still the main benefit for them, they wouldn’t be planning to start charging for it, because that would—and, as this article shows, has—cut off much of that data flow, as reCAPTCHA clients abandon the service for another one that isn’t charging them.
Did you even RTFA and look at hCAPTCHA? hCAPTCHA couldn't be more grossly focused on neural-net training. Hell, one challenge asks you to draw a bounding box and another is a classification tagging.
There was no argument being made for HCAPTCHA in the post to which you replied. So, yeah, everything you mentioned is indeed gross, including Google's behavior.
The parent post was edited.
One of the non-obvious consequences is that any system designed to use technical measures to distinguish between humans and computers will wind up very sensitive. There's an arms race, and us real users are caught in the middle.

There's a vast army of computers doing their best to pretend to be human. The whole point of any kind of CAPTCHA is to try to catch them out - and every measure gets worse over time. So companies like Google look at everything they can see that helps them distinguish typical humans from robots.

This has a nasty side-effect. A lot of measures intended to preserve privacy have the incidental effect of making the privacy-sensitive user look more like a computer and less like a human. Not saving cookies and not executing JS are classic bot moves. This plays directly into the sensitivity that has been engineered over time in order to catch more computers posing as humans.

I don't know any easy resolution to this tension. Maybe you do? I really hope so. The internet is overrun with abusive behavior and the amount of work that goes into keeping it at bay is staggering.

> One of the more insidious elements of ReCAPTCHA is its propensity to challenge users who have robust cookie blocking in place.

It is understandable and I expect HCAPTCHA to do the same thing. The goal of a CAPTCHA is to identify you as a human. I don't know how ReCAPTCHA works, but I expect it to be like spam filters: they have a sample of bots, a sample of humans and assign weights to every aspect, in the end, the algorithm spits out a probability of you being human, and it will challenge you until it reaches a set value.

The thing is: if you hide everything for privacy reasons, you are making yourself indistinguishable from anything else using HTTP, including bots. That's the point, but it also means the only way to prove you are human is through a challenge.

Think of it like a private club. If you a regular and the bouncer is likely to recognize you and let you in without asking anything. But if you don't want to show your face, you will need to show your membership card every single time. That's the price of anonymity.

> One of the more insidious elements of ReCAPTCHA is its propensity to challenge users who have robust cookie blocking in place... ...So good on Cloudflare.

Just to be clear: Cloudflare is only changing the _provider_ of CAPTCHA's. They are not changing the _criteria_ for showing CAPTCHA's.

So users who have robust cookie blocking in place will continue to be penalized.

I would love to see the raw data on how many transactions have been abandoned because of ReCaptcha; if I had to solve a test to purchase my shopping, I'd go elsewhere (and there are places that are not as hostile out there).

I cannot understand the stupidity of putting your entire business in the hands of an advertisement company who gives no shits about you as a business or a person, apart from your data.

I can say for certain ReCaptcha has made me reconsider a purchase and is a major factor in my purchasing decision. If I can't use all my privacy tools (including noscript, and I only whitelist a few times to get the right scripts), then I don't care about what you're selling.

Hopefully in the near future ReCaptcha breaks altogether due to enhanced privacy protection.

I use buster to solve recaptcha.