|
|
|
|
|
by cookiecaper
3587 days ago
|
|
>Reading the previous thread again, I suppose that many of those against scraping didn't realized they've already lost : with Ghost, Phantom, and now headless Chrome you're going to have a hard time to detect a well built scraper. Unfortunately, if you're scraping some data that only has one authoritative data source, they'll know you're scraping them even if they can't distinguish your individual requests from the general traffic. This is what happened to my company. It didn't stop them from pretending that we were setting their servers on fire, even though they had no way to know whether we were or not since they couldn't distinguish our traffic from that generated by other browsers. We were scraping only factual data in the which the company cannot hold a copyright interest. Nonetheless, under Ticketmaster v. RMG, just holding a copy of a page in RAM long enough to parse it constitutes infringement (you have to prove fair use, as Google supposedly did in Perfect 10 v. Google, to avoid this). The difference between yourself and Google/airbnb is that the latter have a lot of money and are trendy technology companies, and you don't and aren't (yet). The lesson is become really big before someone sues you and the judiciary will be on your side. |
|
How would they know you're scraping them?
Surely the capability of any given website admin to detect a particular scraper would depend on many factors such as whether they're even looking for scrapers or are technologically capable of doing so, how many/which IPs the scraping is originating from, and how cleverly the scraper goes about their scraping, no?
It's a bit of a cat and mouse game, wouldn't you say?