Hacker News new | ask | show | jobs
by harperlee 3740 days ago
So what is the legality of this? Apart from the risk of having someone pull the plug on the way one takes the information out, when is something without a proper license able to be used?
2 comments

In the US, there is no copyright protection for "facts" on their own. However, a compilation/database of facts can have copyright protections based on a 3 part test[0].

    1. the collection and assembly of pre-existing material, facts, or  data;
    2. the selection, coordination, or arrangement of those materials; and
    3. the creation, by virtue of the particular selection, coordination, or arrangement of an original work of authorship. 
But specifically there is no protection for the underlying facts themselves, and there is no "sweat of the brow" doctrine. So scraping the data, and rearranging the underlying facts into your own arrangement/organization is almost always not copyright infringement. However, if that data is categorized in some non-trivial way, and you keep that organization, then that is likely to be copyright infringement.

However, if what you're scraping are not "facts", but some creative works, such as blog posts, product descriptions, etc, then it is likely to be copyright infringement.

Then on top of that, even if there is copyright infringement, other defenses such as a license to use the data, or fair use may apply.

[0] - http://www.pddoc.com/copyright/compilation.htm

> So scraping the data, and rearranging the underlying facts into your own arrangement/organization is almost always not copyright infringement.

I'm not so sure. It would definitely be illegal in the US for me to cherry pick data out of Google Maps and add it to OpenStreetMap (and OSM has policies addressing exactly this).

Yet companies like LexisNexis get most their data they resell this way.
Are they scraping copyrighted data? Or public records? Big difference.
No one in the US can hold copyrights to the pure 'facts', especially if one demonstrates they invested enough energy to 'creatively reinterpret' it. Scraping hasn't quite seen a Supreme Court ruling yet (@grellas correct me, please), but I'm sure one could make a reasonable argument that the energy invested in re-collating the data is sufficient enough to pass any barrier. See Feist Publications, Inc., v. Rural Telephone Service Co, 1991. and O'Connors opinion.
Facts aren't copyrightable.

They scrape everything in the world they can get their hands on.

What part of the law does this fall under? Do people get arrested for this? (i.e. criminal) What's the worst that can happen?
That's begging the question of whether Google's data on public streets is actually protected by copyright under U.S. law.
IANAL but in the EU at least, even databases comprised of simple "facts" are protected.

It's a sad state of affairs when i'm not even allowed to scrape data generated using taxpayers' money, like the (required by EU laws) noise maps for cities, which I'd like to use to augment real estate offers, for example.

"Europe" would like to partially fund that noise database with income from businesses that use it. The result is less taxpayer money us needed.

I think it's only the UK that has copyrightable fact databases

Except that doesn't happen because the last thing a new business idea needs is more red tape, paperwork and expenditure.