|
|
|
|
|
by acdha
4220 days ago
|
|
FWIW, every time I've seen what looked like a major search engine ignoring rate-limits (either Crawl-Delay or webmaster tools settings) a check of the actual IPs being used showed that it was someone spoofing a well-known User-Agent, which left you needing some other form of rate-limiting either way. |
|
In fact for a while we would get Bing (MSN bot back then) crawl us everyday at the same time, almost on the dot.
Let me plug project honeypot (which I am in no way affiliated with). This is truly an awesome, and surprisingly accurate, free, service that does an amazing job at collecting heuristics on suspicious IP activity and exposing it in a easy to interpret way..
http://www.projecthoneypot.org/index.php