So what is the legality of this? Apart from the risk of having someone pull the plug on the way one takes the information out, when is something without a proper license able to be used?
In the US, there is no copyright protection for "facts" on their own. However, a compilation/database of facts can have copyright protections based on a 3 part test[0].
1. the collection and assembly of pre-existing material, facts, or data;
2. the selection, coordination, or arrangement of those materials; and
3. the creation, by virtue of the particular selection, coordination, or arrangement of an original work of authorship.
But specifically there is no protection for the underlying facts themselves, and there is no "sweat of the brow" doctrine. So scraping the data, and rearranging the underlying facts into your own arrangement/organization is almost always not copyright infringement. However, if that data is categorized in some non-trivial way, and you keep that organization, then that is likely to be copyright infringement.
However, if what you're scraping are not "facts", but some creative works, such as blog posts, product descriptions, etc, then it is likely to be copyright infringement.
Then on top of that, even if there is copyright infringement, other defenses such as a license to use the data, or fair use may apply.
> So scraping the data, and rearranging the underlying facts into your own arrangement/organization is almost always not copyright infringement.
I'm not so sure. It would definitely be illegal in the US for me to cherry pick data out of Google Maps and add it to OpenStreetMap (and OSM has policies addressing exactly this).
No one in the US can hold copyrights to the pure 'facts', especially if one demonstrates they invested enough energy to 'creatively reinterpret' it. Scraping hasn't quite seen a Supreme Court ruling yet (@grellas correct me, please), but I'm sure one could make a reasonable argument that the energy invested in re-collating the data is sufficient enough to pass any barrier. See Feist Publications, Inc., v. Rural Telephone Service Co, 1991. and O'Connors opinion.
IANAL but in the EU at least, even databases comprised of simple "facts" are protected.
It's a sad state of affairs when i'm not even allowed to scrape data generated using taxpayers' money, like the (required by EU laws) noise maps for cities, which I'd like to use to augment real estate offers, for example.
However, if what you're scraping are not "facts", but some creative works, such as blog posts, product descriptions, etc, then it is likely to be copyright infringement.
Then on top of that, even if there is copyright infringement, other defenses such as a license to use the data, or fair use may apply.
[0] - http://www.pddoc.com/copyright/compilation.htm