Hacker News new | ask | show | jobs
by stummjr 3589 days ago
Crawl-delay is not in the standard robots.txt protocol, and according to Wikipedia, some bots have different interpretations for this value. That's why maybe many websites don't even bother defining the rate limits in robots.txt.
1 comments

I was referring to an actual rate limit, not crawl-delay. For example, YouTube is pretty strict about rate limits:

http://www.bing.com/search?q=%22We+have+been+receiving+a+lar...

I agree that crawl-delay is rare, and often it's set too long so that it's impossible to fully crawl a site -- as if the webmaster set it up 10 years ago and never updated it as their site got faster and bigger.