| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dcow 1052 days ago

I sympathize with your frustration, but you also have to admit that Cloudflare is tasked with an impossible problem: from a sea of requests, identify those that are coming from robots that are disguised as humans.

So there is no perfect solution. You can't use strong identity because a user can share their identity with a robot. You have to use a crapy heuristic that only works most of the time (or tell site owners it's an application layer problem and use this SASS solution to solve the problem).

I mean you admitted that you run a crawler. Cloudflare has detected that you run a crawler and has wants you to prove that you're human to access sites on their network. It actually sounds like their product worked.

In any event, there should probably be better regulation around how this blocking is handled so that users aren't being unjustly blocked. If you want to run a crawler, how do you do it ethically so that you aren't targeted and your traffic blocked? If Cloudflare blocks you from accessing one site should that block extend across their whole network? How long should it last? How do you appeal the block if Cloudflare's heuristics falsely block you? If you're in a life and death situation and need immediate access to medical information and Cloudflare unjustly blocks your access and it causes harm, who's at fault? Etc.

4 comments

eddieroger 1052 days ago

> but you also have to admit that Cloudflare is tasked with an impossible problem

They're not tasked with anything. They choose to sell a bot detection and mitigation platform as a product, and that's a hard business to be in. If they think they can do it, great. If they can't, they shouldn't try.

cj 1052 days ago

The thing I don't understand is why all of the blame is being placed on Cloudflare as a company.

Why not place the blame on the people who are configuring Cloudflare to behave in this way?

I'm a happy Cloudflare Enterprise customer, and our DDoS settings are "Off", we don't present captchas to end users, we don't block any traffic, and we've disabled all of Cloudflare's managed rulesets.

It's very possible to use Cloudflare with all of the security features switched off. The features causing the author's issues are features that can be disabled by the site owner. Cloudflare has power over what they recommend as the default settings, but ultimately it's up to the site owner to choose how to configure Cloudflare for their site.

I think there could be a healthy debate around Cloudflare's default account settings, but I'm surprised by the number of people here dismissing the fact (or maybe not aware of the fact?) that all of these are features that can be turned off. The owner of the site chose to keep bot protection, visitor verification and related features turned on.

btully 1050 days ago

I agree 100%. While I wouldn't go so far as turning off all of the DDoS settings and managed rulesets (why pay for it then?), you can certainly set the "secure/strict" level to medium or low and still retain benefits.

I'm wondering if it's related to Cloudflare's new/updated Bots features, especially the "Super Bot Fight Mode" feature -- which I believe gets a default setting that is super strict.

As others have mentioned, saner defaults might help, but I guess they want to error on the side of "more secure" vs a less secure default.

usere9364382 1046 days ago

If the "feature" says "block bots", and it is blocking people, then cloudflare is to blame, not the users who enabled the feature.

petre 1052 days ago

> Why not place the blame on the people who are configuring Cloudflare to behave in this way?

Sane defaults. Of course everyone would turn DDoS protection on.

drivebycomment 1052 days ago

So are you declaring nobody should be in that business of bot protection then?

account42 1051 days ago

Yes.

Blocking all crawlers except Google bot is itself a problem.

There should not be any bot protection, only abuse (e.g. DDOS) protection. Block disruptive behaviors, not fingerprints.

dcow 1051 days ago

But they are doing it and succeeding. No product is 100% perfect. The problem is that when it’s not perfect people can ostensibly (and arguably actually) be harmed if they can’t access content on the Cloudflare network. This is why we need more scrutiny around how large internet platforms deploy bot mitigation technology. We don’t need to tell people “sorry just suffer DoS attacks”.

dontupvoteme 1051 days ago

Is only Google allowed to crawl?

account42 1051 days ago

Cloudflare is not tasked with anything, they have chosen to take on a task. That that task happens to be impossible does not get them any sympathy for the collateral damage they do while trying.

ipaddr 1052 days ago

Why are humans only allowed and shouldn't we be proactive and accept robots as equals now. We have a history of prejudice against groups and we seem clueless that we are heading their again.

lwansbrough 1052 days ago

Have you ever run an open resource with significant traffic before? People are absolutely abusive with their use of public websites and APIs. “This is why we can’t have nice things” is as relevant as ever.

Cloudflare provides a vital service that solves a real problem that breaks non-pragmatists brains.

danShumway 1052 days ago

> that breaks non-pragmatists brains

Often times when people say this, what they really mean is that they have different opinions about which tradeoffs are tolerable and which tradeoffs aren't.

Captchas are a nightmare for accessibility. Turnstile was designed to solve that problem, but is a nightmare for privacy-oriented and non-standard setups. Getting rid of both systems and blocking based purely on behavior or building entirely new metrics to block on would absolutely be a nightmare for website security.

It's all tradeoffs, but some of those tradeoffs get labeled as "pragmatic" and some of them get labeled as "idealistic" -- mostly just based on the personal values of whoever is making that distinction. The reality is that no matter which direction we go, somebody is going to get the short end of the stick. We all want to minimize harm, but we disagree about who that somebody getting the short end of the stick should be and how short of a stick they should get.

I agree that it's idealistic to claim that we can just let automated agents access any website and that it wouldn't be a nightmare for security. However it is equally idealistic to claim that it is possible to fully secure websites against automated attacks without restricting disabled people, violating user autonomy, or harming the overall health of the open web. I do have sympathy for Cloudflare; they are trying to solve an impossible challenge. That's the key word: it's actually impossible. It's a challenge that can't be solved, we can only do the best we can do and that means accepting tradeoffs both for site security and for accessibility and access.

I disagree with Cloudflare about the exact degree to which solving that challenge justifies and excuses harming the open web and I disagree with Cloudflare's idealistic fantasy that fully solving that challenge is possible without significantly harming the open web. I disagree with some of their product directions and metrics not because I'm idealistic about alternatives but because I'm realistic about the outcomes of what Cloudflare is doing right now.

account42 1051 days ago

So block clients that are being abusive, not "bots".

dcow 1052 days ago

Of course I'd agree that if a robot is following the rules and behaving indistinguishably from a human but maybe just a little more quickly, then it shouldn't be pre-judged (and our detection should accommodate). But here we're talking about robots without agency being e.g. used in botnets to abuse services, or otherwise not following the rules.

ipaddr 1052 days ago

All clients follow the rules if you enforce them. Break rate limit and get a timeout. Settle your payment before you send the product using bitcoin instead of Visa which is not able to do this.

semiquaver 1052 days ago

You’re so close to getting it.

  > Break rate limit and get a timeout

And what exactly should the rate limit key be? From your username I’m sure you are aware that it can’t be the IP address.

It sounds like you’re coming at this from an authenticated API perspective where client identity is a given and anonymous access is the exception. The web inverts this, making everything much more difficult and necessitating the sort of fingerprinting that is at issue in this article and I presume you are opposed to.

enigmurl 1052 days ago

Isn't the point that Cloudfare is essentially enforcing the rules then?