|
Having spent a week battling a particularly inconsiderate scraping attempt, I’m quite unsurprised by the juvenile tone and fairly glib approach to the ethics of bots/scraping presented by the piece. For the site I work for, about 20-30% of our monthly hosting costs go towards servicing bot/scraping traffic. We’ve generally priced this into the cost of doing business, as we’ve prioritised making our site as freely accessible as possible. But after this week, where some amateur did real damage to us with a ham-fisted attempt to scrape too much too quickly, we’re forced to degrade the experience for ALL users by introducing captchas and other techniques we’d really rather not. |
I had a particularly bad time not so long ago, when a customer's site - a shop - was brought to its knees because someone, probably a competitor, hired some scraper-company of some sort to scrape every product and price.
The scraper would systematically go through every single product page.
And by scraper, I mean - 100's of them. All at the same time, using the old trick of 1 scraper requesting 3 or 4 product pages at a time then pausing for a while.
They used umpteen different IP address blocks from all over the globe - but mainly using OVH vps IP address blocks from France.
Now, maybe if they'd just thrown, say, 5 or 10 of the scraper "units" at the site, no one would have noticed in amongst Googlebot (which they wanted to use anyway because they are using Google Shopping to try to bring in more sales).
But no. This shower of arseholes threw 100's of scraper "tasks" at the site. They got greedy.
Now, the site was robust enough to handle this load - barely - which was massive, however, having to do that /and/ also handle normal day-to-day traffic? Nah. The bastards got greedy and like you I spent a few days unfucking the damage they were causing.
Seriously, I hate scrapers. I hate the people who make scrapers. I hate their lack of ethics. Fuck those guys.