Hacker News new | ask | show | jobs
by colinmorelli 2230 days ago
I'm going to guess people aren't typically talking about it for a few reasons:

- We started out with self generated and self hosted captcha. It was too easy to beat. Complexity of the image generation turned up until eventually it was easier to just outsource it to someone else. Going to throw out a guess here that reCAPTCHA is far from simple, and likely exceeds what most teams would want to run internally.

- Google has an uptime that's significantly higher than most companies. I'm not defending any of Google's habits or business practices, but I personally wouldn't bet that most companies can run software more reliably than Google.

- As someone else mentioned, fail open is an option in situations like these (depending on the threats you're trying to protect against). For something with a high probability of failure, this could make sense, but I would have a hard time imagining a team allocating time to deal with the case "when Google is down" unless it's truly life or death software (think: surgical robots, autopilots, etc)

1 comments

Why was self-generated and self-hosted captcha easy to beat?

I found that generating math questions in a captcha style (curved / with other noise drawing over) and requiring that questions to be answered in a box is unbeatable. The bad actor would require very good OCR and after that also good math parser to answer. Easy for human, very hard for automation. And the script was like 50 lines long that did that.

"easy for human" is very subjective. Users very regularly have a hard time with all forms of image captcha for a whole bunch of different reasons: visual acuity, color deficiency, learning disability, unclear instructions, visually similar characters, etc. If you allow users to refresh the image until they see an easy one they might be able to overcome it themselves but some percentage of those users will get frustrated and leave. Not to mention allowing regeneration of images also makes it easier for bots to cycle until they find one they're confident in. Surely if there were a dead simple for humans, difficult to beat for bots, 50 line script option for CAPTCHA generation that could be self hosted it would be in wide use.

reCAPTCHA changed to its current model to try to significantly reduce friction in the "hopefully normal" case (down to just a check box if all goes well) because every ounce of friction you add to critical inflection points in your product translates to meaningful lost opportunity.

Even if this wasn't a problem, and it were trivial to create something that's easy for humans and hard for computers, it's just not worth most companies' time. Would they rather spend a few days properly implementing and testing a captcha solution, then whatever unknown time on future bug fixes and support, or setup reCAPTCHA in 30 minutes and move on to things that produce value for their customers?

I see that as an absolute win. If you're having problems understanding simple math questions then I won't want you as my user in the first place. Morons out.

As for visual impaired ones, I agree this one is harder to crack. Usually you do it by audio, which in itself is more then 50 lines of code, but here is my personal approach. Absolutely none is stopping you to have, for visual impaired ones, a separate step like the one described in OP, where you have mail activated. You see visual impaired users have infinitely more patience then normal "visual" ones. They are used for web to not be friendly, so they won't mind going through extra hoops if they want your service. So a checkbox saying "I am visual impaired and I want registration by e-mail" or something equivalent and you're good to go.