| Hi! This is definitely not legal advice, so consult with a lawyer and do your own research if you are thinking of applying this to your own practices. I work for Scrapinghub as well and try to understand the law around this. I can help with some pointers to why I think some web scraping isn't illegal… there are of courses some limits to this. When the data scraped is "is publicly available on the Internet, without requiring any login, password, or other individualized grant of access", the Eastern District Court of Virginia in Cvent vs Eventbrite (https://casetext.com/case/cvent-inc-v-eventbrite) ruled one could not be deemed to be exceeding unauthorized access. There are two ways, that I know of, that courts have ruled you can exceed your authorization: - When the site owner has contacted you and removed your authorization in a written manner, as happened on Craigslist vs 3taps. - By accepting the terms of service and agreeing against scraping. You have to do this through a "clickwrap" ToS, rather than a "browsewrap". You can read about the differences here: https://termsfeed.com/blog/browsewrap-clickwrap/ As a matter of policy, we don't scrape any site with a ToS with clear anti-scraping language and which forces us to create an account or "constructively agree" as part of the use of the site. Any user wishing to revoke authorization for anyone using our platform can make an abuse report on our site– we tend to handle these within 24 hours and haven't had a single claim go further than this stage, as we aim to be reasonable and look for a way to provide value to both sides. |
Most websites have anti-scraping boilerplate in their ToU. I'm pretty sure that's in the "customized" ToU I got from LegalZoom. So you're basically saying that if the data is not behind a login and doesn't require you to fill any forms that contain either a checkbox or nearby language that indicates submitting the form constitutes acceptance of the ToU, you'll scrape it, even if the ToU explicitly bans scraping.
What do you plan to do when you do receive a C&D from someone that claims you've agreed to their browsewrap ToU? Are you going to argue that your use is not unauthorized since the data is public?
I guess I assume you'll comply with any C&D since you state that receipt of a written removal constitutes a revocation of authorization. However, I don't believe that's enough for some. Check out QVC v. Resultly, where QVC sued on a CFAA claim based on browse-wrap agreements (they lost; Resultly asserted permission was granted by robots.txt and the Court agreed).
Beyond just CFAA claims, there are copyright and trademark claims attached to most scraping cases. Those have unfortunately succeeded most of the time. The most egregious is Ticketmaster v RMG, where it was ruled that RMG had violated Ticketmaster's copyright by downloading the page (specifically, making a copy of the page contents in RAM and extracting only non-copyrightable content, and then discarding the complete copy; in short, downloading the page). Facebook v. Power Ventures is also pretty brutal.
I hope Scrapinghub is well-funded enough to trailblaze some space in the law here, because we really need it.