Hacker News new | ask | show | jobs
by Bulk70 4057 days ago
Serious question, because I can't imagine your use case - under what circumstances would you wan't to block all bots except one?
5 comments

Facebook blocks all bots except a select few to prevent site scraping - https://www.facebook.com/robots.txt
I'm not an expert but my guess is: limiting bot traffic, but keeping the site available for the most popular search engine.
My experience is that the worst bots don't respect robots.txt anyway.

Getting crawled by the major search engines typically isn't that bad, they tend to know what they're doing. Getting hammered by some crappy local search engine is what's annoying.

We don't limit any bots, except once where we completely blocked Eniro in our firewall. Google, Bing and a ton of other could index at the same time, with no issue. Eniro for some reason decided to just index way to much at once, no reaction to robots.txt and no reply from the email they so kindly included in the headers.

But I see your point, it's just a bit sad when Google has become "The Internet".

I thought FB was the internet. Googlebot is just the Kleenex of indexers.
Owner of GOOG stock, maybe?
Maybe for ethical reasons or some kind of exclusive agreement with one particular search vendor?
Is it of any importance?