|
|
|
|
|
by tangue
3588 days ago
|
|
Reading the previous thread again, I suppose that many of those against scraping didn't realized they've already lost : with Ghost, Phantom, and now headless Chrome you're going to have a hard time to detect a well built scraper. Instead of fighting against scrapers that don't want to harm you, maybe it's about time to invest in your robots.txt and cooperate. You could say that scraping you're website is FORBIDDEN, but come on : if Airbnb can rent houses, I can scrap you site. |
|
Unfortunately, if you're scraping some data that only has one authoritative data source, they'll know you're scraping them even if they can't distinguish your individual requests from the general traffic.
This is what happened to my company. It didn't stop them from pretending that we were setting their servers on fire, even though they had no way to know whether we were or not since they couldn't distinguish our traffic from that generated by other browsers.
We were scraping only factual data in the which the company cannot hold a copyright interest. Nonetheless, under Ticketmaster v. RMG, just holding a copy of a page in RAM long enough to parse it constitutes infringement (you have to prove fair use, as Google supposedly did in Perfect 10 v. Google, to avoid this).
The difference between yourself and Google/airbnb is that the latter have a lot of money and are trendy technology companies, and you don't and aren't (yet).
The lesson is become really big before someone sues you and the judiciary will be on your side.