Hacker News new | ask | show | jobs
by reefoctopus 4929 days ago
Many have suggested editing your robots.txt. This is absolutely the first step you should take. You could try blocking the crawlers by name or limit the request rate with a crawl delay in the robots.txt.

If the crawler ignores your robots.txt, check it's name in your access logs. Often, people build things and set them loose without thinking about the consequences. Many crawlers have a homepage / programmer contact information somewhere on the web. Let them know they are hammering your website.

What is the rate at which requests are being made? Are they making 1000 requests per second? Downloading tons of images? You should probably just ignore it if it is less than 1 request per second.