Hacker News new | ask | show | jobs
by buo 1466 days ago
I think this paragraph on the difficulty of building good independent indexes should not be overlooked. What's going on with Cloudfare?

> When talking to search engine founders, I found that the biggest obstacle to growing an index is getting blocked by sites. Cloudflare is one of the worst offenders. Too many sites block perfectly well-behaved crawlers, only allowing major players like Googlebot, BingBot, and TwitterBot; this cements the current duopoly over English search and is harmful to the health of the Web as a whole.

1 comments

CloudFlare isn't that bad in my experience. They were really aggressively blocking me when I started out, but there are some hoops[1] you can jump through to make them recognize your bot. Goes a long way.

It does depend on the sites' settings though. Some are set to block all bots, and then you're kinda out of luck.

In general, I've found that like 99% of the problems you might encounter running a bot can be solved by just finding the right person and sending them an email explaining your situation. In almost all cases, they'll let you through.

[1] https://blog.cloudflare.com/friendly-bots/

That's good to know -- thanks!