Hacker News new | ask | show | jobs
by scarab92 454 days ago
It's success as a business aside, at a technical level neither Cloudflare nor its competitors provide any real protection against large scale scraping.

Bypassing it is quite straightforward for most average competency software engineers.

I'm not saying that CloudFlare is any better or worse at this than Akami, Imperva etc, I'm saying that in practice none of these companies provide an effective anti-bot tool, and as far as I can tell, as someone who does a lot of scraping, the entire anti-bot industry is selling a product that simply doesn't work.

1 comments

In practice they only lock out "good" bots. "Bad" bots have their residential proxy botnets and run real browsers in virtual machines, so there's not much of a signature.

This often suits businesses just fine, since "good" bots are often the ones they want to block. A bot that would transcribe comments from your website to RSS, for example, reduces the ad revenue on your website, so it's bad. But the spammer is posting more comments and they look like legit page views, so you get more ad revenue.

I don't believe that distinction really exists anymore.

These days everyone is using real browsers and residential / mobile proxies, regardless of whether they are a spammer, or a Fortune 500, a retailer doing price comparison of an AI company looking for training data.

Random hackers making a website to RSS bridge aren't using residential / mobile proxies and real browsers in virtual machines. They're doing the simplest thing that works which is curl, then getting frustrated and quitting.

Spammers are doing those things because they get paid to make the spam work.