| HN Mirror

The problem is "the law" is murky on web scraping. For example, did you know that even if your users are only extracting non-copyrighted (even non-copyrightable) data from a page, a judge once ruled that the act of storing the entire page in RAM constituted copyright infringement, since it contained some copyrighted elements that were immediately disposed of after extraction (like the company's logo)? This was Ticketmaster v. RMG Technologies, and it was used against Power Ventures in Facebook's case against them.

Contrast with Feist Publications, Inc. v. Rural Telephone Service Co., where it was ruled that it was legal to copy data from a phonebook and republish, since it was non-copyrightable factual data.

There are several other ridiculous early rulings that were made while the internet was still coming of age, and I think before many judges really understood the way it worked. Recent cases have been bucking these precedents, but you can still get the book thrown at you based on those rulings.

Read about 3Taps and please understand that you will be sued, as they were, unless you fold the moment you get a C&D, which would make your site fairly useless.

Google and all other search engines are illegal in the US in most cases. They just don't get in trouble for most of their activity because people usually want to be on Google. If you end up collecting data in a way that someone doesn't like, things won't go so well for you. See Facebook Inc. v. Power Ventures, Inc.. That guy got raked over the coals; I'm sure Facebook was trying to make an example of him.

Data portability is a threat to the business model of many web incumbents, and that means they want scraping, a critical tool for ensuring that portability, to remain in a nebulous grey area; this allows them to use it for their own purposes (which they often do) and also to try to block people who are using data found on their platform in a way they don't like. This basically results in the bigger company getting their way, because only other multi-billion dollar companies really have the resources to fight against the army of $1k/hr lawyers that public companies hire to try to enforce their opinions on upstarts.

What we really need is serious internet law reform that favors a fair and open platform. Unfortunately, whenever we hear about "internet law reform", it's skewed to the interests of the megacorps who want more tools to shut down innovators that may threaten their business models, not toward creating an open and fair environment for innovation.

Consider, for instance, how ridiculous it would be if every time you opened a book one of the title pages contained a "Terms of Reading" that bound you not to use the information in the book, even the non-copyrightable information, in any way that the book's publisher didn't like, required you to only read the book using the publisher's approved reading methods (perhaps only Oakleys and Ray-bans are publisher-approved eyeglasses, only Herman-Miller publisher-approved seating, and only GE bulbs publisher-approved lighting), required you to agree that you'd never sue the publisher in court but always use private arbitrators that the publisher can easily, even implicitly, buy off, and so forth.

Consider the viability of the argument that you committed copyright infringement by looking at the pages of the book when the author didn't want you to, because the reflection of the content on your eyes constituted an illegal copy.

These things would get laughed out of court, but the digital equivalent is frequently upheld when it comes to online activity.

I think eventually things will stabilize and scraping non-copyrighted data will unambiguously not be a crime, but unfortunately, I think it may still be a few more decades until that happens. I really hope your company is able and willing to help us set the right precedents by committing the tens of millions it will take to win each piece of that stability, since you're set up so perfectly to be the target of several scraping-related lawsuits.

Recent rulings, like QVC v. Resultly and Nguyen v. Barnes and Noble Inc. have been much more positive than former ones, even if they're not altogether ideal, indicating that some magistrates are starting to think of the internet in sensible terms. The rest has to be done through the legislature. Please help make the web safe for data.

IANAL