Hacker News new | ask | show | jobs
by JimWestergren 1341 days ago
A great opportunity right now for CloudFlare to win some goodwill and PR by helping out EasyList for free right now.

But what about simply enable a firewall and show captcha or similar if the origin IP is from India and requesting that URL until the situation is under control? I did that with the free plan recently in CloudFlare in a similar situation and it worked perfectly (of course on a much smaller scale).

10 comments

These apps behind cannot render the captcha, as the fetch is happening in the background.

However what you can do is match the user-agents, and return a global/catch-all adblocking rule that blocks all the content of all the pages (by blocking the body element).

The app developers are going to notice the issue very fast (because users are reporting the problem), and mirroring the lists or adding a cache is immediately going to be their priority.

Bonus: I think some browsers and extensions can execute JavaScript in adblocking rules; https://help.eyeo.com/adblockplus/snippet-filters-tutorial

(which is essentially re-using a gigantic XSS in order to notify the user)

Generally, I like the idea with the user agents filtering and “block everything” rule. No need for geoblocking. Insert a comment about why this is happening and ask for it to be changed.

However, as we’re living in the real world and the authors of the respective browsers strike me as lazy or uninterested, I also bet all that would change is the user agent.

"User agent" is a synonym for "browser". When you say "user agent" here, what you really mean is the contents of the header that identifies the user agent, i.e., browser. Calling it that is a little bit like referring to Chrome's developer tools as "Inspect Element" (based on the mistake that that's supposed to be its name, rather than recognizing that the label is just a simple, descriptive verb/action).
I think the idea was to block users without technically consuming bandwidth. A captcha is equivalent to blocking.
Blocking all page content to knowingly cause unintended behavior… I wonder if this can be considered criminal.

I read that poisoning your own lunch to catch a workplace fridge thief could be considered assault.

EDIT: here’s what I read. https://law.stackexchange.com/questions/966/can-one-be-liabl...

Imagine, say, you update the list to block all URLs, and it impacts some municipal government worker’s ability to update some emergency alert service and causes hundreds of people to be permanently injured.

I don't think so. Google often knowingly and intentionally breaks apps (through API deprecation) because it's more convenient for them or that it is costly to maintain. Nothing criminal there.

Same for Easylist, if they decide that a quota of 100000 requests per IP+UA per day is the maximum, that's their choice. They owe nothing to the consumers of the lists.

That being said; Easylist actually benefits from being distributed in many apps; it is really valuable to influence / control adblocking lists, so the more flexible they are to the browser developers, the better (I guess).

I think you misunderstood what parent was referring to. The idea was to poison the block list so that any browser matching their criteria (user agent belonging to DDOSing browser) would block everything.
If an application can't handle failed web requests that application is already broken. Web requests can and will fail at any time.
No one is forcing anyone to use this tool, they have every right to send an alert indicating the produce a user is using has been abusing their service.

Very much in the same way that image host use to change an image for those hotlinking directly to images in the early days of the net.

I appreciated parents comment because it points towards an interesting direction. No one is forcing anyone to use this tool, no one is forcing anyone to steal their food. In terms of individuals acting in line with expectations, the individual poisoning their own food as a trap shouldn't inconvenience anyone if everyone's being civilized.

Providing a service (which you expect others to consume) and then not only deciding to refrain from providing, but "poisoning" the output, is an interesting move. We don't consider them equivalent, but in a case where this application was providing some essential service that is not easily replaced, and physical harm was a result, how do we consider it?

I don't think you can remotely compare the two, and no physical harm is actual done. And if an extension stops working because it depends on a list, the list can be removed, the extention can be disabled, a different browser can be used. Ad blocking isn't an essential service that can be easily replaced, and it isn't being provided as anything but a voluntary service with no uptime or availability assurances.

The made up scenerio of this preventing some critical task from being accomplished is stretching at best.

True but I bet 99% of CloudFlare's income comes from companies that wish to see EasyList die in a fire. I'm pretty sure this would factor into their strict enforcement of the 'rules'. I mean, this is something between github and CloudFlare right? And github sure hosts a ton of other .txt files and other stuff that's not 'web content'. They don't enforce it so strictly with other sites.

Still, I'm sure the 'community' can figure out how to keep something like this online. I'd be happy to pony up some cash for decent hosting and I'm sure many would be. If that doesn't work out, something like ipfs, a torrent or whatever.

Correct. And let's not forget that the company which owns them would also like to see EasyList die in a fire.
Looks like it's fast to download now.
I am following up internally. Looks like there's a combination of this data not being cached, our systems thinking a DDoS was happening (which it sort of was). But getting the full story now.
I'm glad they sorted it out, but I wish there was a proper support route other than "create a sufficient media storm so that an employee tweets the CEO"
I can't understand their argument that a text file 'isn't a web content'; seems like a bullshit excuse.
This doesn’t sound like bullshit to me. Serving a static text file that is primarily used by applications is not in line with their terms of service.

Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms.

But it is web content, really. A txt file renders fine in every browser.

Many websites also push text through HTML as part of AJAXy stuff. If they actually enforced this for all sites, their service would no longer be usable.

Movies will render in browsers just fine too, doesn’t mean that cloudflare will allow you to cache them.
Sadly, despite my arguments in the same direction, Cloudflare refuses to host a base64 encoding of the new Flubber.
I can think of few things more static than a .txt file.
You're missing the point. The service cloudflare donates isn't free. That is the whole point of EasyList’s post. There are plenty of comments on this submission doing back of napkin math to find a reasonable monthly cost for hosting that text file. If you want to donate that bandwidth - go for it.

But the comments here about Cloudflare’s ToS read a lot like folks feeling entitled to getting bandwidth for for free. Cloudflare is providing a very specific service for free, and it does a lot of good.

It would be great if Cloudflare decided to donate. But I’d re-evaluate your stance if you’re feeling entitled to their resources.

> You're missing the point.

You're missing the point. If Cloudflare's issue is with bandwidth, then they should say so and leave it at that, not conjure up this pathetic excuse about .txt files somehow not being "web content". Does wrapping that data in <html><body><pre> </pre></body></html> magically fix the bandwidth issues?

Even just plain text file without any tags is a valid HTML 5 file. You don't need any tags. <html> and <body> are implied.

All you'd need is a <pre> tags or <style> somewhere if you'd want it rendered not just as one large paragraph.

I guess you may need a <!DOCTYPE html>

Just making a file valid HTML doesn't make it "web content". This file is being fetched by an application, not being viewed by a user.

I'm not sure this is the most reasonable rule but there are definitely some benificial aspects to it. For example the load on human-viewed content is limited by how often people want to view it. Not how often their browser wants to redownload it.

Actually you're missing the point. It doesn't seem like many people are condemning Cloudflare for not serving a bandwidth-heavy file for free (FTA: "CloudFlare does not allow non-enterprise users use that much traffic").

Rather what's being condemned is this nonsense customer service characterization of a text file as somehow not "web content". Easylist.txt is a data file that could just as easily be in JSON (and be larger). Furthermore, as it stands easylist.txt actually looks like it's a valid text/html file, as browsers generally don't insist on <html>/<body> tags. So from both directions it seems like the customer service drone has thrown out this nonsense just to short circuit having to do their job.

I like the HN approach of taking a charitable interpretation of their message.

Clearly EasyList lived on their free tier for a long time without interruption. Only when they used excessive bandwidth did ToS enforcement happen. When they reached out for support, the support agent rightly pointed out that this isn't a website file.

Reading the ToS, the support agents message appears to be correct. Text files are fine (as is pretty much any format) as long as it isn't the main focus of the HTTP server Cloudflare is fronting. Robots.txt would be fine, turning the list into XML or HTML would not be fine. In this case, the text file isn't there to support the web content of Easy List - it's distributing a text file to applications.

The agent could have added additional context but their message is valid.

But it's not JSON or HTML. And it's not meant for browsers. It's clearly a dataset as a text file and not meant as "web(page) content". What's nonsense about that when it's completely accurate?
A text file is a website file and that is what annoyed me in that support reply. The web is not just html, css and JS.

But in order to make the support eng do his job I need to add a .html extension to it?? Would that be considered a website file?

Interestingly, one could host this on a WWW frontend for Git. Then you'd only need to download (say a daily) diff. Why download the entire list when you can match checksum?
So if the text was embedded in a static webpage that the client had to parse locally, that'd be okay?
Does not inspire confidence in Cloudflare, that’s for sure.
I think CloudFlare pretty explicitly do not want people to be confident that they can serve 2 petabytes a month of API data on the free tier
That’s part of the problem though, isn’t it?

Because they certainly want to serve some huge amount of traffic for free while they attempt to become the next abusive monopoly platform.

They’re trying to have their cake and eat it too.

Maybe if they created a web page for easylist and then hosted that + the lists directly on CF pages, maybe that would considered as web content?
It probably means that their DDoS protection needs to use JS to get some trust signals
Web content is for consumption by people.
Just tack on a .html file extension and add a <html> tag at top and bottom…problem solved
So does this mean any site with a security.txt file is violating cloudflares ToS?
How about robots.txt, since security.txt is looked at by humans, but robots.txt is almost exclusively looked at by non web browser clients.
Pwned Passwords project by Troy Hunt is served by CloudFlare cache. I don't know scale of bandwidth usage by Pwned Passwords. But CloudFlare can definitely make the similar arrangement here too.
This is a bit different though. You are basically taking away a main revenue stream from websites, your main clients. That sounds like bad optics for them.
I can understand but my reply was with reference in parent comment

> A great opportunity right now for CloudFlare to win some goodwill and PR by helping out EasyList for free right now.

Wouldn't their R2 service tick all the boxes for this one?

https://developers.cloudflare.com/r2/platform/pricing/

Sounds like they'd probably be in for at least $500/mo on this which doesn't seem like a lot if you're serving the amount of data EasyList is doing, but is a lot if your previous hosting costs were "free".
Most requests will be in the background or in Cron jobs. Captcha wouldn't be possible in those situations as it would never be seen by anyone.
I’m not sure a captcha would help though. These aren’t intentional attack requests, they’re “legitimate” requests by a clueless developer’s app that happened to get popular.

They just need to serve either an empty response or an intentionally broken rule to break the misbehaving browser and force its developers to fix it.

Yes there is of course that as well!
> EasyList is hosted on Github and proxied with CloudFlare. Unfortunately, CloudFlare does not allow non-enterprise users use that much traffic, and now all requests to the EasyList file are getting throttled.

> EasyList tried to reach out to CloudFlare support, but the latter said they could not help. Moreover, serving EasyList actually may violate the CloudFlare ToS.

Seeing the comments from Cloudflare here, looks like the HN machine has yet again worked its magic to get appropriate attention!

A captcha for all 600 million internet users seems like overkill. Maybe a smaller subnet range.
that would break everyone in India not using one of those broken browsers
They are already serving access denied replies, so I assume they can identify the browsers via user agent or similar?

If so, returning a bogus file that blocks everything and adding a comment in that list asking the developers to use caching or mirroring the file should be fine.

I wonder if those browsers honor the list when fetching the update though. Would be awesome if you could just add easylist and lock out further requests right on the device.

Browser developers can choose to fake user-agents. Brave uses a generic chrome user agent so it cannot be differentiated from regular Chrome.
Everyone in the world is impacted if the site goes down under load. Changing that to everyone in a particular country (perhaps with a given user agent if the free plan allows expressions) would still be an improvement even if other work is needed.