Hacker News new | ask | show | jobs
by z2amiller 5620 days ago
FWIW, most of the "Major Players" support the Crawl-Delay directive in robots.txt, see: http://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl...

I know at minimum GoogleBot and Yahoo! Slurp respect this value. I believe Google AdsBot (Landing page URL verification for AdWords) also supports this value, but GoogleBot and AdsBot each will crawl at the rate you specify (so if you specify a Crawl-Delay of 1, each bot will crawl at ~1qps). I don't know if it is in the spec, but fractional crawl delays appeared to be respected (Crawl-Delay: 0.25 would result in ~4qps, for example).

I too have fought with this problem - at times more than 90% of the capacity of the site I was running was devoted to serving up content for bots. I sympathize with small (and large) companies who don't want to add capacity so that new/random bot can add another <x> QPS to the daily baseline load.