| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by greglindahl 3589 days ago
	In the current web, sites like Amazon are so large that you'll need many crawlers. On the plus side, it appears that almost all large sites don't have rate limits.

1 comments

stummjr 3589 days ago

Crawl-delay is not in the standard robots.txt protocol, and according to Wikipedia, some bots have different interpretations for this value. That's why maybe many websites don't even bother defining the rate limits in robots.txt.

link

greglindahl 3589 days ago

I was referring to an actual rate limit, not crawl-delay. For example, YouTube is pretty strict about rate limits:

http://www.bing.com/search?q=%22We+have+been+receiving+a+lar...

I agree that crawl-delay is rare, and often it's set too long so that it's impossible to fully crawl a site -- as if the webmaster set it up 10 years ago and never updated it as their site got faster and bigger.

link