Hacker News new | ask | show | jobs
by greglindahl 3589 days ago
In the current web, sites like Amazon are so large that you'll need many crawlers. On the plus side, it appears that almost all large sites don't have rate limits.
1 comments

Crawl-delay is not in the standard robots.txt protocol, and according to Wikipedia, some bots have different interpretations for this value. That's why maybe many websites don't even bother defining the rate limits in robots.txt.
I was referring to an actual rate limit, not crawl-delay. For example, YouTube is pretty strict about rate limits:

http://www.bing.com/search?q=%22We+have+been+receiving+a+lar...

I agree that crawl-delay is rare, and often it's set too long so that it's impossible to fully crawl a site -- as if the webmaster set it up 10 years ago and never updated it as their site got faster and bigger.