Hacker News new | ask | show | jobs
by bananaandapple 5745 days ago
They could sue you for copyright infringment if they wanted.

And allowing google to index your "stolen" content pages is just outrageous. You don't own that content.

1 comments

Then we could sue Google for copyright infringement for caching our pages, I guess... Why would we not allow Google to cache it? Each cached page has a great big box on top saying that this is the historious version of the cache and linking to the original site...

Example: http://cache.historious.net/cached/515865/

There's a huge difference between what Google is doing and what you're doing.

1) Google is caching pages for a specific purpose and ensuring that they aren't cached/scraped by others:

http://webcache.googleusercontent.com/robots.txt

By not excluding robots, you're opening yourself to all kinds of situations where you are responsible for draining revenue from the owner of the content, which leaves you liable to lawsuits. By contrast, the way that Google caches content and their rules surrounding it do not generally harm the copyright owner.

2) Google honors all robots.txt, no-archive meta-tags, and other indications that the author doesn't want the page to be cached. Is historious doing the same?

1) We do exclude robots now, yes. 2) historious doesn't spider websites, it only saves the pages the users give us. It's the same as a user deciding to make a backup of a webpage on their computer...
"It's the same as a user deciding to make a backup of a webpage on their computer..."

... and then publishing it on the Internet.

(This is not meant to be snarky or to imply opposition to your product at all. I think there is a meaningful difference between saving to a computer and saving to a web-accessible, apparently globally readable website.

Isn't it a users responsibility to obey copyright restrictions in this case, given that we never publish content unless the user does it? It's basically the same situation as hosting a website, if you upload and publish a copyrighted page, is the host responsible?
In my opinion, those two cases are not similar. I doubt that this type of automatic caching/publishing would have any protection under the DMCA safe-harbor laws unless you're making it clear to users what they're doing (I'm not a user of the service, so maybe you already are).

If I understand correctly, the users of your site are simply bookmarking pages. You are then caching it, storing it, and publishing it with a world-readable URL. There are many ways that you could provide the same experience to the user without making the cached page publicly accessible.

If you were to give users the option to make specific bookmarks world-readable - and you provided a disclaimer explaining that they should not make copyrighted material world-readable - then it might be different. But that's probably something you should discuss with an attorney.

If someone uses your service to republish a few dozen News Corp pages, then sends them the link I reckon you'll be in court before sunset.

Edit: I think it's a great idea though to save bookmarked content, just not to republish it without permission.

Indeed, what google is doing (caching a page and showing it to the user) is copyright infrigment in some countries. (e.g Belgium, ...).

There hasn't been any case against them but theoriticaly someone could sue them. Who will win is a different story.

Hmm, that's interesting... Another difference is that google is doing it by itself, whereas historious only stores pages that users specify and only publishes them when the user specifies it.

We'll have a chat with our lawyer regardless, thank you!

Google honour robots.txt.