Hacker News new | ask | show | jobs
by blakesterz 2263 days ago
> "Earlier this year, Google informed us that they were going to begin charging for reCAPTCHA. That is entirely within their right. Cloudflare, given our volume, no doubt imposed significant costs on the reCAPTCHA service, even for Google."

Even in the article they say... "Google provided reCAPTCHA for free in exchange for data from the service being used to train its visual identification systems." ... I thought this was one of those win/win things... Google gets something, websites get something... what's changed? Is Google not getting much out of reCAPTCHA now?

7 comments

In the article they also say:

> Again, this is entirely rational for Google. If the value of the image classification training did not exceed those costs, it makes perfect sense for Google to ask for payment for the service they provide.

This might be exacerbated in the case of Cloudfare. Imagine a system where 99% of the visitors being challenged are human. The data gathered from such visitors is quiet, quality data. That fits the usecase of validating an anonymous poster on some random blog. Now consider the Cloudflare usecase. Visitors will only be challenged when Cloudflare already expects you're a bot. Most of the challenges are served to bots. The data is much lower quality, but their cost per challenge has remained the same.

It could just be that as this type of usecase became dominant, the balance of value tipped.

I guess this is very true. Our quite elaborate Cloudflare Firewall setup combining bot management scores with GeoIP and network information to decide on the action has solve rates below 0.5% on most rules.

The only case where we see up to 3% solved is on rules targeting networks which contain mostly free (as in beer) VPN providers (the new pest of the internet). Those networks sent a lot of malicious and automated traffic with the mixed in 3% of real users.

To put this into numbers of the past 24h: ~ 76 Million requests served ~ 1 Million of those were captchas ~ 0.5 Million were outright blocked Captchas solved: 1233

Seeing that reCAPTCHA v3 doesn't use endless streams of images any more, I would guess that Google no longer benefits much from having users tag storefronts, traffic lights, buses or fire hydrants. Maybe their image recognition algorithm is past that stage.
It does as a fallback. But you’re missing the main point of v3, which is that it shifts the legal onus of blocking from Google to the integrating site. No longer can Google be sued for accessibility violations, if it’s the site that’s stopping the user from entering purely on a suggestion from Google.
Just because you do some technically workarounds doesn't mean you get a legal free pass.

I don't think this aspect did matter much because it was always the sites decision to use reCAPTCHA and that didn't change.

I also don't think Google gets much profit out of the image tagging part anymore, they already have a huge database of tagged images.

> Seeing that reCAPTCHA v3 doesn't use endless streams of images any more

On the other hand, I've been effectively banned from several sites because I don't accept third-party requests to Google from non-Google sites as a result of this change.

Pure speculation, but at some point your dataset is large enough.

The original reCAPTCHA corrected errors in scanned books published decades/centuries ago. At some point, they're all fixed.

Similarly, more recent images have all been of traffic images. And they probably have way more than enough now -- at least of the type that can be done by reCAPTCHA.

So unless Google comes up with a new mass-categorization problem easy enough for literally everyone to do and simple and small enough to fit in a reCAPTCHA... then they charge.

I think using captchas for image recognition was one of the most ingenious strategies of the modern web. Don't think Google is making the correct move here.

Overall I would like to see these checks removed and Cloudflare is using them quite excessively.

> Google provided reCAPTCHA for free in exchange for data from the service being used to train its visual identification systems.

Has this been true lately? Every time I see it, it gives me the same images from a set of 3. 90% of the time it's classifying street lights, and it's the same street lights every time. About 7% of the time, it's pictures with cars in them, and again, it's the same pictures most times (but in a different order, I think). The remaining times it's fire hydrants or store fronts, often in a language I can't read, so I don't know if it's a store or not. (And again - mostly the same images each time.)

It's probably a question of size. Same as with Google Analytics. Google can afford to offer it free of charge for smaller websites but charges for larger ones. Cloudflare was probably one of the heaviest users with a very high percentage of bots (as they're good in pre-filtering).
my bet is that the bean counters have caught up with this product, and it'll be run into the ground with excessive pricing, because Google products have to make millions or otherwise they'll be killed. most notably, Reader.
These complaints about Google "moving too fast" used to really confuse me. I couldn't really spot a meaningful difference in mean survival b/w Google products, start-ups similar to individual Google products, and other businesses' behaviour.

But I've now attained zen-like clarity on the issue: the complaints are coming only, and always were coming mostly, from people whose idea of appropriate change over time is to still complain about Google Reader almost a decade after it happened.

this is being intentionally obtuse. Hire shutting down got 200 points 7 months ago: https://news.ycombinator.com/item?id=20815293, also see: Hangouts, Google+, Nest, Code Search, Site Search.....

it's not about "moving fast" at all. it's about google killing anything that doesn't make millions as opposed to just thousands (enough for basic maintenance). I never said anything about timeframe.