Hacker News new | ask | show | jobs
by RonanTheGrey 2230 days ago
Nobody here is talking about the elephant in the room where reCAPTCHA (and hCAPTCHA has the same problem) is concerned:

The other day when Google was having issues (the same day that a bunch of Android apps were crashing due to a bad map data push), I was unable to log into my bank, unable to pay my electric bill, and a half dozen other things I needed to do that day.

Because Google's servers were down, core service providers were unable to do anything either because they block access to their site without recaptcha approving the entry.

To me, as a technologist, as a builder of software, this is absolutely and entirely unacceptable. Captcha needs to be something you can self host.

I don't understand this habit of handing Google a knife and then telling them where to stab you.

3 comments

I'm going to guess people aren't typically talking about it for a few reasons:

- We started out with self generated and self hosted captcha. It was too easy to beat. Complexity of the image generation turned up until eventually it was easier to just outsource it to someone else. Going to throw out a guess here that reCAPTCHA is far from simple, and likely exceeds what most teams would want to run internally.

- Google has an uptime that's significantly higher than most companies. I'm not defending any of Google's habits or business practices, but I personally wouldn't bet that most companies can run software more reliably than Google.

- As someone else mentioned, fail open is an option in situations like these (depending on the threats you're trying to protect against). For something with a high probability of failure, this could make sense, but I would have a hard time imagining a team allocating time to deal with the case "when Google is down" unless it's truly life or death software (think: surgical robots, autopilots, etc)

Why was self-generated and self-hosted captcha easy to beat?

I found that generating math questions in a captcha style (curved / with other noise drawing over) and requiring that questions to be answered in a box is unbeatable. The bad actor would require very good OCR and after that also good math parser to answer. Easy for human, very hard for automation. And the script was like 50 lines long that did that.

"easy for human" is very subjective. Users very regularly have a hard time with all forms of image captcha for a whole bunch of different reasons: visual acuity, color deficiency, learning disability, unclear instructions, visually similar characters, etc. If you allow users to refresh the image until they see an easy one they might be able to overcome it themselves but some percentage of those users will get frustrated and leave. Not to mention allowing regeneration of images also makes it easier for bots to cycle until they find one they're confident in. Surely if there were a dead simple for humans, difficult to beat for bots, 50 line script option for CAPTCHA generation that could be self hosted it would be in wide use.

reCAPTCHA changed to its current model to try to significantly reduce friction in the "hopefully normal" case (down to just a check box if all goes well) because every ounce of friction you add to critical inflection points in your product translates to meaningful lost opportunity.

Even if this wasn't a problem, and it were trivial to create something that's easy for humans and hard for computers, it's just not worth most companies' time. Would they rather spend a few days properly implementing and testing a captcha solution, then whatever unknown time on future bug fixes and support, or setup reCAPTCHA in 30 minutes and move on to things that produce value for their customers?

I see that as an absolute win. If you're having problems understanding simple math questions then I won't want you as my user in the first place. Morons out.

As for visual impaired ones, I agree this one is harder to crack. Usually you do it by audio, which in itself is more then 50 lines of code, but here is my personal approach. Absolutely none is stopping you to have, for visual impaired ones, a separate step like the one described in OP, where you have mail activated. You see visual impaired users have infinitely more patience then normal "visual" ones. They are used for web to not be friendly, so they won't mind going through extra hoops if they want your service. So a checkbox saying "I am visual impaired and I want registration by e-mail" or something equivalent and you're good to go.

A sensible person would implement the use of the captcha to fail open - if Google is down, then let the user in without passing a captcha.

Was this a mistake on the bank's part, or Google's?

Only if the probability of failure makes the extra effort worth it. Since this is a pretty rare event, a sensible person could well wait until they see actual impact before putting in the work. Hypothetical problems always vastly outnumber actually experienced ones.
It's surprising to me that on-prem reCAPTCHA isn't a service that seems to exist (based on a quick search).

Even if it's not Google's reCAPTCHA - is it so hard to make something like this that only Google can provide it? Surely the big players would want this component under their control exactly for reasons like "we don't want to have an outage due to a provider outage". Or at least, fail over to a less-preferred backup. Like if Cloudflare had such a service.

Cloudflare looked at the options (including building their own) and moved to hcaptcha (but mostly because Google wanted to charge them money), so it must be hard at that kind of scale, since bot writers are monetarily incentivized to defeat the captcha.

https://blog.cloudflare.com/moving-from-recaptcha-to-hcaptch...