Hacker News new | ask | show | jobs
by eykanal 496 days ago
The problem with this paper is that, while technically true, there are many website owners who have found that CAPTCHAs have effectively reduced the spam on their site to zero. The fact that a CAPTCHA _can_ be bypassed doesn't mean that it _will_, and most spam bots are not using cutting-edge tech because that's expensive.

To say "it's worthless from a security perspective" is a pretty harsh and largely inaccurate representation. It's been tremendously useful to those who have used it. If it wasn't valuable, it wouldn't be so widely used.

Definitely agree with the whole "tons of free $$$ for Google", but that's kind of their business model, so yeah, Google is being Google. In other breaking news, water is still wet.

4 comments

Far too many people talk about security as if it's a simple binary and not about effort levels and dissuading attackers.
People really struggle with things that have measurable, probabilistic effects. You see it with healthcare ("Steve smoked his whole life and never got cancer, so cigarettes aren't bad for you!"), environmental effects ("Alice was poor and she didn't rob anyone, so poverty is no excuse!"), hiring ("Charlie is a great employee and he had no experience, so you should never look at backgrounds!"), etc.

It should be a general standard of proof for any sort of sociological claim that you look at rates, not just examples, but it usually isn't.

Well, I would at least ask what the baseline was. The vast majority of websites on the internet don't really have to deal with sophisticated bot traffic, and a very simple traditional CAPTCHA, one that can be trivially solved using existing technology, will also cut SPAM to zero or very close. I don't know exactly why this is, but I suspect it's because most of the bot operations that scale far enough to hit low volume websites are very sensitive to cost (and hence unlikely to deploy relatively-expensive modern multi-modal LLMs to solve a problem) and not likely to deploy site-specific approaches to SPAM.

There are a lot of things that can trivially cut down SPAM ranging from utterly unhelpful to just simply a bad idea. Like for example, you can deny all requests from IPs that appear to be Russian or Chinese: that will cut out a lot of malicious traffic. It will also cut some legitimate traffic, but maybe not much if your demographics are narrow. ReCAPTCHA also cuts some legitimate traffic.

The actual main reason why people deployed reCAPTCHA is because it was free and easy, effectiveness was just table stakes. The problem with CAPTCHAs prior to reCAPTCHA is simply that they really weren't very good; the stock CAPTCHAs in software packages like MediaWiki or phpBB were just rather unsophisticated, and as a double whammy, they were big targets for attack since developing a reliable solver for them would unlock bot access to a very large number of web properties.

Do you need reCAPTCHA to make life hard for bots, though? Well, no. Having a bespoke solution is enough for most websites on the Internet. However, reCAPTCHA isn't even necessarily the best choice even for something extremely high-volume. Case-in-point, last I checked, Google's own DDoS protection system still used a bespoke CAPTCHA that largely hasn't changed since the early 2010s; you can see what it looks like by searching for the Google "sorry" page.

I agree that reCAPTCHA is not "worthless" but it's worth is definitely overstated. Automated services that solve CAPTCHAs charge less than a cent per-solve. For reCAPTCHA to be very effective against direct adversaries rather than easily-thwarted random bots, the actual value of bypassing your CAPTCHA has to be pretty damn low. At that point, it's very reasonably possible that even hashcash would be enough to keep people from SPAMing.

Yeah, we've used CAPTCHAs to great effect as gracefully-degraded service protection for unauthenticated form submissions. When we detect that a particular form is being spammed, we automatically flip on a feature flag for it to require CAPTCHAs to submit, and the flood immediately stops. Definitely saves our databases from being pummeled, and I haven't seen a scenario since we implemented it a few years ago where the CAPTCHA didn't help immediately.

Reminds me of the advice around the deadbolt on your house - it won't stop a determined attacker, but it will deter less-determined ones.

Cutting edge tech like paying cents to captcha solving services?!
Yep. You'd be surprised how stupid some bots are.

(And while I don't have hard data on this, I suspect that bot authors that don't know how to properly set up rate-limits and don't know how to set up captcha solving service bypass, so captchas are especially effective against them)

You can easily get an API key from 2Captcha, AntiCap, CapSolver, etc. - it costs basically nothing and it works.