Hacker News new | ask | show | jobs
by antonyh 121 days ago
As facebookexternalhit is listed in the robots.txt, it does look like it's optimistically rechecking in the hope it's no longer disallowed. That rate of request is obscene though, and falls firmly into the category of Bad Bot.
2 comments

My guess is it's dutifully obeying it, not storing anything from the site and then exiting, without clearing the site from the crawl queue.
That is probably the dumbest yet most genius solution to getting your scraper blocked I've ever seen