Hacker News new | ask | show | jobs
by metalruler 4935 days ago
From a site owner's perspective: if you have a LOT of data then scraping can be very disruptive. I've had someone scraping my site for literally months, using hundreds of different open proxies, plus multiple faked user-agents, in order to defeat scraping detection. At one point they were accessing my site over 300,000 times per day (3.5/sec), which exceeded the level of the next busiest (and welcome) agent... Googlebot. In total I estimate this person has made more than 30 million fetch attempts over the past few months. I eventually figured out a unique signature for their bot and blocked 95%+ of their attempts, but they still kept trying. I managed to find a contact for their network administrator and the constant door-knocking finally stopped today.