Hacker News new | ask | show | jobs
by shiomiru 1052 days ago
The cause of the problem is that your software is faulty by design.

1. IP addresses are to be used for packet routing. Certainly not for assigning "behavior scores" to users in the background. IP addresses say nothing about your visitors, my IP address could have been a complete stranger's IP address yesterday.

2. Deciding who can access half the web based on their TLS signature achieves nothing in the long run except reinforce browser monopolies, and goes completely against the spirit of the open web.

I guess now I have to use Chrome for browsing the web from home. Yes, I do run a crawler-like bot as a hobby project, I got what I was asking for. (Funnily enough, it still works if I just emulate Chrome's TLS signature). But I also have friends who have done absolutely nothing of sorts (no technical skills), and still got caught up in this latest ban wave.

Let's be honest here. Your service has likely caused millions of people harm who one day to the other are suddenly blocked from half the WWW - not just nerds, who can get around that one way or the other, real users who just got unlucky and now are potentially blocked from accessing websites required for their daily lives (welcome to the 21th century). This is not a one time problem, it has been going on for years; this time it just came too suddenly for too many people. And this kind of harm is a logical conclusion to the heuristics you use for determining who can view a website.

Never mind that it's ridiculous how a single company from outside my country has the power to decide on whether I can use the web or not. That's kind of on website owners unconditionally giving this power to CF anyway.

Now, allow me to return to purchasing proxies from shady sources for myself, so I can keep using Firefox. Thanks and keep up the good work.

8 comments

I sympathize with your frustration, but you also have to admit that Cloudflare is tasked with an impossible problem: from a sea of requests, identify those that are coming from robots that are disguised as humans.

So there is no perfect solution. You can't use strong identity because a user can share their identity with a robot. You have to use a crapy heuristic that only works most of the time (or tell site owners it's an application layer problem and use this SASS solution to solve the problem).

I mean you admitted that you run a crawler. Cloudflare has detected that you run a crawler and has wants you to prove that you're human to access sites on their network. It actually sounds like their product worked.

In any event, there should probably be better regulation around how this blocking is handled so that users aren't being unjustly blocked. If you want to run a crawler, how do you do it ethically so that you aren't targeted and your traffic blocked? If Cloudflare blocks you from accessing one site should that block extend across their whole network? How long should it last? How do you appeal the block if Cloudflare's heuristics falsely block you? If you're in a life and death situation and need immediate access to medical information and Cloudflare unjustly blocks your access and it causes harm, who's at fault? Etc.

> but you also have to admit that Cloudflare is tasked with an impossible problem

They're not tasked with anything. They choose to sell a bot detection and mitigation platform as a product, and that's a hard business to be in. If they think they can do it, great. If they can't, they shouldn't try.

The thing I don't understand is why all of the blame is being placed on Cloudflare as a company.

Why not place the blame on the people who are configuring Cloudflare to behave in this way?

I'm a happy Cloudflare Enterprise customer, and our DDoS settings are "Off", we don't present captchas to end users, we don't block any traffic, and we've disabled all of Cloudflare's managed rulesets.

It's very possible to use Cloudflare with all of the security features switched off. The features causing the author's issues are features that can be disabled by the site owner. Cloudflare has power over what they recommend as the default settings, but ultimately it's up to the site owner to choose how to configure Cloudflare for their site.

I think there could be a healthy debate around Cloudflare's default account settings, but I'm surprised by the number of people here dismissing the fact (or maybe not aware of the fact?) that all of these are features that can be turned off. The owner of the site chose to keep bot protection, visitor verification and related features turned on.

I agree 100%. While I wouldn't go so far as turning off all of the DDoS settings and managed rulesets (why pay for it then?), you can certainly set the "secure/strict" level to medium or low and still retain benefits.

I'm wondering if it's related to Cloudflare's new/updated Bots features, especially the "Super Bot Fight Mode" feature -- which I believe gets a default setting that is super strict.

As others have mentioned, saner defaults might help, but I guess they want to error on the side of "more secure" vs a less secure default.

If the "feature" says "block bots", and it is blocking people, then cloudflare is to blame, not the users who enabled the feature.
> Why not place the blame on the people who are configuring Cloudflare to behave in this way?

Sane defaults. Of course everyone would turn DDoS protection on.

So are you declaring nobody should be in that business of bot protection then?
Yes.

Blocking all crawlers except Google bot is itself a problem.

There should not be any bot protection, only abuse (e.g. DDOS) protection. Block disruptive behaviors, not fingerprints.

But they are doing it and succeeding. No product is 100% perfect. The problem is that when it’s not perfect people can ostensibly (and arguably actually) be harmed if they can’t access content on the Cloudflare network. This is why we need more scrutiny around how large internet platforms deploy bot mitigation technology. We don’t need to tell people “sorry just suffer DoS attacks”.
Is only Google allowed to crawl?
Cloudflare is not tasked with anything, they have chosen to take on a task. That that task happens to be impossible does not get them any sympathy for the collateral damage they do while trying.
Why are humans only allowed and shouldn't we be proactive and accept robots as equals now. We have a history of prejudice against groups and we seem clueless that we are heading their again.
Have you ever run an open resource with significant traffic before? People are absolutely abusive with their use of public websites and APIs. “This is why we can’t have nice things” is as relevant as ever.

Cloudflare provides a vital service that solves a real problem that breaks non-pragmatists brains.

> that breaks non-pragmatists brains

Often times when people say this, what they really mean is that they have different opinions about which tradeoffs are tolerable and which tradeoffs aren't.

Captchas are a nightmare for accessibility. Turnstile was designed to solve that problem, but is a nightmare for privacy-oriented and non-standard setups. Getting rid of both systems and blocking based purely on behavior or building entirely new metrics to block on would absolutely be a nightmare for website security.

It's all tradeoffs, but some of those tradeoffs get labeled as "pragmatic" and some of them get labeled as "idealistic" -- mostly just based on the personal values of whoever is making that distinction. The reality is that no matter which direction we go, somebody is going to get the short end of the stick. We all want to minimize harm, but we disagree about who that somebody getting the short end of the stick should be and how short of a stick they should get.

I agree that it's idealistic to claim that we can just let automated agents access any website and that it wouldn't be a nightmare for security. However it is equally idealistic to claim that it is possible to fully secure websites against automated attacks without restricting disabled people, violating user autonomy, or harming the overall health of the open web. I do have sympathy for Cloudflare; they are trying to solve an impossible challenge. That's the key word: it's actually impossible. It's a challenge that can't be solved, we can only do the best we can do and that means accepting tradeoffs both for site security and for accessibility and access.

I disagree with Cloudflare about the exact degree to which solving that challenge justifies and excuses harming the open web and I disagree with Cloudflare's idealistic fantasy that fully solving that challenge is possible without significantly harming the open web. I disagree with some of their product directions and metrics not because I'm idealistic about alternatives but because I'm realistic about the outcomes of what Cloudflare is doing right now.

So block clients that are being abusive, not "bots".
Of course I'd agree that if a robot is following the rules and behaving indistinguishably from a human but maybe just a little more quickly, then it shouldn't be pre-judged (and our detection should accommodate). But here we're talking about robots without agency being e.g. used in botnets to abuse services, or otherwise not following the rules.
All clients follow the rules if you enforce them. Break rate limit and get a timeout. Settle your payment before you send the product using bitcoin instead of Visa which is not able to do this.
You’re so close to getting it.

  > Break rate limit and get a timeout
And what exactly should the rate limit key be? From your username I’m sure you are aware that it can’t be the IP address.

It sounds like you’re coming at this from an authenticated API perspective where client identity is a given and anonymous access is the exception. The web inverts this, making everything much more difficult and necessitating the sort of fingerprinting that is at issue in this article and I presume you are opposed to.

Isn't the point that Cloudfare is essentially enforcing the rules then?
You're being a little dramatic. It's incredibly unlikely that millions of innocent users have been blocked, and unless you have data to the contrary you shouldn't make such a claim.

You know what else is harmful to the concept of the open internet? The enormous malicious botnets and other endemic problems that require a solution like CloudFlare.

Data point of N+1, but I haven't been able to place online orders at Petco for about a year now because they use some Cloudflare feature that hates my browser + home internet connection. Other Cloudflare-proxied sites seem unaffected, and I'm not doing any botting/crawling, nor do I have any IoT devices on my home network. There's not enough information provided to be able to do any substantive troubleshooting.

This became irritating enough that it caused two side effects: (a) I stopped shopping at Petco, and (b) I moved a pile of sites off of Cloudflare and stopped recommending them, and now sometimes recommend against them.

Cloudflare is still a good, quick, cheap option for sites that receive unusual volumes of malicious traffic, so I'll still recommend them as a solution to some problems. But, they're not a good default.

So you're mad at Cloudflare because Petco enabled a feature that blocked you? If Petco had developed something in-house that blocked you, would you be mad at the compiler?
Cloudflare offers this service. If Cloudflare offered a service that enabled Petco to do something amazing would you be grateful to Cloudflare? If Cloudflare advertised on its homepage about blocking a DDOS attack on a website would you say, "meh, Cloudflare wasn't responsible for blocking that attack, they only provided a feature. The website blocked the attack."? If not, then why should Cloudflare be immune from criticism when the opposite happens?

Cloudflare offered Petco the features to do this as a product and makes money off of Petco's usage of those features. I do sympathize with the perspective that ultimately tools need to be somewhat neutral and it can be dangerous to forward around responsibility. But "tools are neutral" can also be taken to an absurd degree. This isn't 5 levels of indirection here and it's not Petco going and installing a neutral piece software that they downloaded from Github. Petco is a client. They're turning on toggles that Cloudflare built into their user interface and advertises as features.

There's some level of moral accountability there for how those features are abused. I'm not saying it should be illegal, I'm not saying it shouldn't be allowed, but Cloudflare is definitely at least eligible for criticism. This is a product, it's not Petco abusing Cloudflare's infrastructure; they're using the product as intended and advertised.

...no, I've changed my recommendations for Cloudflare because it may prevent ordinary users from using a site, and insufficient information is provided for troubleshooting purposes, and those users are likely not going to go to extraordinary lengths to report the problem. Even if they do report it, the site won't be able to troubleshoot it either. So, if you don't need it, you're probably better off without it.
> It's incredibly unlikely that millions of innocent users have been blocked

Is there a 'town square' where we can talk about being presented captchas and similar things from 3rd party intermediates.

I think it's incredibly likely that millions of hours have been wasted on such challenges.

On that note...

https://www.folklore.org/StoryView.py?project=Macintosh&stor...

"Well, let's say you can shave 10 seconds off of the boot time. Multiply that by five million users and thats 50 million seconds, every single day. Over a year, that's probably dozens of lifetimes. So if you make it boot ten seconds faster, you've saved a dozen lives. That's really worth it, don't you think?"

Imagine if people still thought like this about computers and software.

Yes. And cookie splash screens! I admire GDPR's intention but hasn't it been a massive human time sink.

Not to take away from your point, just that it's all a hindrance.

That's more on the websites that track your personal data for non-essential purposes. No tracking means no banners are necessary.
Finer points, my point is just about people wishing to view web pages.
I don't know that most web admins can tell if they should float a banner, so vague is the law.

Technically, I think if you have the default Apache logging configured and you read those logs, you should probably float that banner.

@adammartinetti : maybe you could consider developing a new product where you display a GDPR consent banner once, and then these settings apply to all Cloudflare-proxied websites (by passing this consent information as an additional header to the proxied site)
Sounds inferior to the "no cookies no banner" solution.

The GDPR does not mandate gratuitous and pointless personalised spying, which is the only case that requires consent. Normal operations (say a shop collecting payment details and shipping address to fulfil an order) do not require a consent banner.

Those can at least be blocked with ad blockers and/or disabling JS.
ReCAPTCHA was designed with this in mind: given that we had the need to distinguish humans from bots, it presents problems that are hard for bots to solve, where the resulting output is valuable. So the time consumed isn't wasted.
It's wasted from the perspective of the end user.
Not when the end user turns around and uses Google Maps which is now populated with higher quality fine feature information due to the training of the machine learning system on what traffic controls look like.
Valuable to whom?
I dont get your second point. Two things can be harmful to the open web at once. CloudFlare is definitely not taking the right approach at it, which damages the open web alongside botnets. Also, botnet owners are for some reason extraordinarily nerdy and smart so they probably will find a way to fool CF every other month. Its a cat and mouse game for them while actively harming everyone else both with their botnets and the increased aggressiveness of CF caused by their incorrect solution
>You know what else is harmful to the concept of the open internet? The enormous malicious botnets and other endemic problems that require a solution like CloudFlare.

You know what's infinitely worse? Monopolies.

Half these problems can be fixed by banning certain parts of the world. It's just politically shifted out of the Overton window to do that so CF profits greatly.

For every one user that makes their way on here and finds and posts here on this thread probably represent 1,000,000 plus normal users

An open web is open for everyone/thing not just classes of beings you select. Bots and users can both be malicious and both can be positive.

I agree with the premise that most people don't know how to identity or visibly complain about a given technical problem, and so an HN thread with N anecdotes about the problem likely corresponds to N * F actual amount of real-world incidents, for some value of F > 1.... but claiming it's a factor of a million without any backing evidence is absolutely an overreach.

> An open web is open for everyone/thing not just classes of beings you select. Bots and users can both be malicious and both can be positive.

This I agree with. I run an archiver ~monthly on a subset of my month's browsing history, and I'd hate if that got me blacklisted from Cloudflare-backed sites for a benign purpose. (See also the idea of remote attestation)

That's a pretty good idea. Do you randomly sample, or just exclude some domains? Is there some tool out there that does it for you?
Assembling the list of links to archive is a manual process--I just log them in an Obsidian notebook with a category and summary, and I later post it to my blog. (I don't really think other people care, it's more for me to be able to find past things I've found interesting.)

For the archival process I use ArchiveBox[1] running as a container on my NAS; I just grep through the note for `http|https` and feed the resulting list to the archiver. For everything not-hackernews I set the depth to 1, but for HN threads I do 2 so I grab whatever people may have linked in the comments.

I think there's ways to hook into like, ALL Firefox history or saved posts on reddit, but that's way heavier than what I care for.

[1]: https://archivebox.io/

Interesting! Firefox history is just SQLite. I might do something like, take all non-search URLs and archive them once a month or so. Thanks for the inspiration.
cloudflare blocks me every time I open an incognito window. No VPN, just having no cookie towards a domain automatically means I'm a bot…
Right. It feels silly to point out specific things when just about everything about these verification checks deployed by every megacorp throws you into a universe of suffering.

If there's a place to start, it would be with eliminating the infinite challenge loops. Bad enough that IP blocks get outright blocked. Bad enough that I have to decide whether or not that blurred sliver of the edge of the wheel+shadow constitutes being part of the bicycle. Not to mention the humanitarian betrayal of the absolute highest form to farm the free human labor to train AI models when they are simply trying to browse the $#@%ing internet.

You're going by the specified, designed use cases of those technologies.

Every spec is a three-edged sword: the spec, the intent of the spec, and the use of the spec in the wild.

In practice, Cloudflare does a pretty good job on far-more-than average of gluing together some heuristics in an unspec'd way to filter traffic. It sucks because you can't plan around it, but that's rather the point because the malicious actors are trying to plan around it also.

(ETA: Hacker News rate-limited this post. In theory, I could have set up a sock-puppet to try and work around that, but then they would catch that too and I'd be out two accounts. So I just waited out the limit. Measure and counter-measure. ;) ).

Why does HN have such aggressive and seemingly illogical post rate limits anyway?

Is it a theory about increasing quality of communications? It can't be a tech bottleneck.

I don't have this information first hand, but that's my assumption.

Dang was handing them out like candy on January 6th. And I think he was justified in doing so; there was a coup in progress in the United States, so discourse here went completely off the rails.

But it's a very easy to implement method of throttling volume, which helps improve the conversation by minimizing opportunities for people to gish-gallop. You can email and ask to have it removed; I have refrained from doing so because it serves as a gentle reminder not to get dragged down in the lowest common denominator of what passes for discourse on the site from time to time.

> Let's be honest here. Your service has likely caused millions of people harm who one day to the other are suddenly blocked from half the WWW

If this was true, Cloudflare wouldn't be a good product used by a lot of sites.

Excluding people who are poor, weird, privacy-conscious or otherwise inconvenient from your site is a feature not a bug, especially when you can pretend it's an accident.
You're assuming website owners are aware of the issue… How would they know? Cloudflare is just telling them it blocked a bunch of bots.
They have before and after analytics.....
They will presume the before traffic was bots? Unless they also see a drop in sales or ads they won't notice.
I mean if sales didn't drop why would they care?
If it's a ebsite that doesn't sell anything they won't notice.
That's a false dichotomy. It is both a good and a bad product, depending on perspective.

To a large firm, 1% failure is acceptable. To the affected 1%, it's a disaster. Consider wrongful imprisonment as an example.

The penalties for being excluded from the web are fairly severe, and looking to become more so. CF is fairly lean; there is no available human to operate an escape hatch for when things go wrong.

When I'm king, every block or account suspension must provide a phone number, and hang the inefficiency.

And even when it doesn't block you completely, it delays website loading, makes you jump through frustrating captchas, etc.

It's probably third in the list of frustrating web behaviors in the past couple of years (behind GDPR popups and registration/paywalls that seem to have gotten much worse recently).

And somehow there are some sites that I get CF delay walls on every time I visit.

This feature is utterly broken for a good web experience; it pushes users away from sites which use it.

Every time that "checking your browser" page comes up for a legitimate user should be considered a failure. Sure, it can maybe happen a few times in a thousand, but the feature is utterly broken if it comes up every time I visit the same site from the same browser not in private mode.

It's worth noting that the websites that you are visiting chose Cloudflare, and have enabled the features that irritate you. They have browser integrity enabled, have bot protection enabled, maybe turned the security level up (gitlab famously is a nuisance because they lean heavily on Cloudflare for protection). Sometimes they've wholly barred VPNs or entire geographic areas! And that is entirely the decision of website operators, and note that they did all of this before Cloudflare came along.

Cloudflare's customers are website operators, not you the end user. Those website operators seem pretty pleased with the service, so clearly they are doing a good job for the people who they are building it for.

And every Cloudflare customer is a company I won't do business with (unless there is absolutely no way around it)

Cloudflare is running the single biggest, most blatant man-in-the-middle attack in history, and far too many people are happy about it

Agreed. The same goes for the 3rd party "data privacy" popups which simply hide a long list of opt-outs several layers deep in a Vendors list. I refuse to use such sites and I let them know by email.
In what way is it an attack? (I know what a mitm is, I'm not asking you to explain that - I'm pretty conversant in the concept of a proxy, I'm asking you to explain why it's an attack specifically)
they block and/or slowdown vast swaths of the internet

if that's not an "attack", I don't know what is

I don't get it. They offer a service that that people choose to sign up for and take active steps to use. I don't see how that's an attack. Honestly I'm still trying to understand who is being attacked.

Like is it an attack on the site owner - are you saying cloudflare is extorting them or something? That seems unlikely but I agree that would be a form of attack... it also doesn't seem to be what you're saying.

Is it an attack on the user of the website because the website owner successfully denies visitors it does not want? Does that mean that login credentials are a form of attack too? Would an on-prem load balancer or WAF that dropped all traffic from a region or matching patterns still be an attack?

It just doesn't make sense that it's an attack.

And this is precisely why I don't bother reporting Cloudflare's failures to site operators anymore. I used to do it, when it was pretty infrequent. Site operators were usually concerned that something was blocking customers, but most were clueless about what was causing it or how to fix it.

Eventually I gave up. I don't even bother with their captchas or other stupid human tricks anymore. Whenever Cloudflare gets between me and the site I'm trying to use, I move on and shop somewhere else. Life's too short for this.

Why not take a screenshot of the CF error and send it to the website owner? It would freak me out if I thought a significant number of my website's users were being blocked by CF.
I’ve done this before, and the response is always “this is the first time I’ve seen this” and “you must be a bot operator”.
+1 to anybody who creates a site to name and shame CF customers who block legitimate traffic. For a few months now I've been taking screenshots every time this happens, but with no end goal. Complaining to the individual site owners feels like a lifetime commitment, and there are virtually none I need that badly.
I have done that numerous times. Even sent a screen recording of the Cloudflare spinner of hell. The response is always the same: you must be running some shady software on your machine.

Cloudflare is acting as judge and executioner, and site owners never accept that the product may be faulty.

They will just tell you to use unmodified Chrome.

And soon with Web Integrity API they may start telling you to use Chrome on Windows or MacOS, rendering Linux completely unusable.

The work needed to maybe get it past outsourced customer support is not at all worth the effort for any site I don’t actually need to use
How do you send them that when you can't access the contact form and/or contact information on their site because Cloudflare blocks it? (assuming a normal visitor, not someone who knows about whois etc.)
Send it to the domain contact from WHOIS information.
Your alt solution is what?Everyone should build their shit to handle millions TB/s of DoS traffic?
Block the countries it comes from?
This is what Cloudflare already does, and it's hellish for users.
It sure would be nice if there was a reason DoS could only come from countries other than those of your customers/users. But that's not the case.
This is a very nasty comment. I was wondering if I could find some things that could lead to an exception.

But I'm pretty sure that millions of users aren't using stuff like w3m pager ( https://news.ycombinator.com/item?id=34175754 )

We're all technical here, we are the edge cases. We use exotic software / combos. Let's not get carried away here

The PM of cloudflare uses Firefox, I sometimes use Firefox and I don't notice any difference ( concerning this use-case at least).

If you want help, perhaps describe the actual use-case that is blocking you to him. He shared his email.

- country

- software ( VPN, ... )

- browser

- OS

- traceid

- ...

Either way, buying shady proxies as you mentioned is already a warning flag.

While using Firefox is not :)