Hacker News new | ask | show | jobs
by tptacek 5013 days ago
That might be true, but it doesn't follow that you can lawfully scrape facts out of copyrighted content on someone else's website.
3 comments

'aklofas auto-killed question:

I do believe that is exactly what search engines do. Are they breaking copyright?

This is not a simple question. Huge court cases have been fought over this. In general, things that get Google off the hook in these cases:

* Publishers have the ability to opt out of Google

* Where Google creates copies of information from other sites, those copies are provided to uses noncommercially (ie, they don't make more money when you use their cache).

* Google uses DMCA Safe Harbor to avoid liability, which again turns in part on Google honoring opt-out requests from publishers.

* Google's use of the data is transformative, an idea that in part turns on it not being a direct substitute for the original.

These are not generally arguments that bode well for PadMapper, which is effectively trying to compete with Craigslist using Craigslist data and a better interface. Publishers generally want Google to do things differently... but when push comes to shove, they also really want to be in Google's index. The same is not true for PadMapper.

How is this different than Feist v. Rural in your mind? A fact is not copyrightable and is scrapable according to that case. I don't see how having copyrighted content next to non-copyrighted content affords any protection to the non-copyrighted content.
First, Feist says nothing about content being "scrapable". It's a 1991 case. To pull content off Craigslist against their will, you have to cross the CFAA.

Second, phone numbers are raw facts, but advertisements are not; every advertisement ever has been copyrighted, and a whole 11-figure industry depends on that.

The distinction 3taps makes is that it doesn't scrape content from Craigslist but from search results from google and bing.
about a week or two after Craigslist filed the lawsuit against 3taps, they put a "noarchive" tag on their listings pages. Since then, their content isnt available in search engine caches.