Hacker News new | ask | show | jobs
by b65e8bee43c2ed0 20 days ago
it's all for nothing, because Cloudflare's scraping protection works about as well as a $5 padlock - good enough to dissuade bored teens, not good enough to dissuade even an amateur burglar. if someone wants to scrap your publicly visible data, they will. there's nothing you can do.
5 comments

At the same time: it sure works well enough to annoy anyone with a "bad ASN" IP with 80 captchas a day.
exactly that's what I was thinking... like the day they provided a solution to the issue they posed
It's how I remember I've left my VPN on
Exactly. I’m constantly amazed at how little you actually need to bypass CF, Amazon, Azure WAFs and so on (Incapsula springs to mind too). When you look at the code you’ve come up with, it’s actually quite small and compact.

More to the point, these systems actually help scraping because proof of work unlocks essentially unlimited scraping, in my experience.

That said - from my experience on the other side, sure you can’t stop people like me or you, but you can stop 99% of the others. That’s more than worth it operationally.

What do you mean by ~"PoW unlocks unlimited scraping"?
Usually after you solve the POW challenge, sites let you make a lot of requests before asking you to complete another.
> Cloudflare's scraping protection works about as well as a $5 padlock

It sure seems to keep me, the casual visitor, far away from just about any site they "protect". I have zero desire to alter my browsing configuration or use extra tools to get around turnstile, I'd rather not even visit the site in the first place.

>, I'd rather not even visit the site in the first place

Until your bank, airline, and tax ministry start using them.

Even more reason to boycott sites using it now.
I vote with my wallet and dump misbehaving banks.
Overwhelming majority of customers doesn't even know they can care. And most of them wouldn't anyway. So your vote doesn't matter to anyone but you, sadly.
"Misbehaving" by protecting themselves
If you're willing to do it, a real browser with playwright is enough.
Playwright isn't sufficient for all cases.
Not for high volumes of data.
It is if you're willing to pay the extra overhead. ex: Google and MS both use rendered pages for advanced scraping.
$5 padlocks work against what most website owners care about: the common consumer who is using a different app and seeing their site content with someone else's ads on top of it.