| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by o-__-o 1865 days ago
	This will scale wonderfully!

2 comments

jgrahamc 1865 days ago

No, what scales is us making our DDoS and bot detection not disrupt the crawling of legit search engines that respect robots.txt, don't crawl at ridiculous speeds, don't do dumb stuff like pretend they are the Googlebot. We have teams who work on that. You can read more here: https://blog.cloudflare.com/tag/bots/

But let's suppose someone is building a new cool search engine and our ML stuff is blocking them. Then... contact us/me.

link

o-__-o 1865 days ago

So for my startup to crawl sites I must now adhere to Cloudflare’s Requirements of the Web(TM) or reach out to individual engineer, who may leave at any moment. Gotcha

(but Google is allowed because Google was first to market)

link

midev 1865 days ago

Why would you possibly think you can do whatever you want to someone else's site?

Yes, you must adhere to the controls that site administrators put in place, like Cloudflare.... You don't get to blast my site with requests, just because you want to...

link

o-__-o 1865 days ago

(a) Who said I was blasting your site with requests? Cloudflare stops much more than just blasts

(b) But you’re a-ok with Google doing this. Gated communities aren’t really good for anybody but I see what you are saying.

link

midev 1865 days ago

Gated communities are great. They lower the risk of crime significantly: https://www.sciencedaily.com/releases/2013/03/130320115113.h...

The same is true online. Apple's walled garden has kept hundreds of millions of people safe on their device. It's why iOS malware isn't a thing.

> Cloudflare stops much more than just blasts

Exactly. There's even more benefit to Cloudflare than just DDoS. Captcha's for stopping credential stuffing, for example.

link

o-__-o 1865 days ago

..Didn’t realize my startup search engine stuffed credentials :(

But hey if I pay Cloudflare enough, then I’ll get to blast your site and possibly stuff creds at the same time :/

link

timlardner 1865 days ago

That doesn't sound unreasonable. Out of interest, what would you consider a ridiculous speed to be crawling at?

link

LinuxBender 1865 days ago

I can't speak for Cloudflare, but crawling speed should be dictated by the site owner via the robots.txt crawl-delay. [1] A site owner could also rate-limit unauthenticated requests by IP via the cloudflare header using a 429 too many requests error page.

[1] - https://en.wikipedia.org/wiki/Robots_exclusion_standard#Craw...

link

o-__-o 1865 days ago

This here is the problem. It’s a new time no one wants to be Rfc compliant, just go behind a service and problem is solved.

So no problem, time to move on web search is no longer exciting

link

77pt77 1865 days ago

It seems to be by design.

link