Hacker News new | ask | show | jobs
by cm2187 2944 days ago
The irony is the "do as I say, not as I do".
2 comments

Google's web scrapers obey robots.txt, you can stop Google from crawling your website if you want. Google doesn't want you crawling their website.

That word, I don't think it means what you think it means.

Google supports consensual scraping, and respects sites which opt-out (using robots.txt) just like they have. It's no more ironic than someone selling a product they don't happen to use themselves.
I think there's a credible argument that it's not purely consensual. Websites are forced to allow search engines with a lot of market share to scrape them or they won't be found.

No matter how well-intentioned you are, if you write your own scraper and have it abide by robots.txt, you'll never get nearly as many resources as Google or Bing. Many websites approve only their scrapers and ban everything else outright.

I don't have anything against the large search engines, it's just not really easy to say no to their scrapers for most websites.

I didn't consent to all this debt. It was just not really easy to say no to all these great credit cards.