Then we could sue Google for copyright infringement for caching our pages, I guess... Why would we not allow Google to cache it? Each cached page has a great big box on top saying that this is the historious version of the cache and linking to the original site...
By not excluding robots, you're opening yourself to all kinds of situations where you are responsible for draining revenue from the owner of the content, which leaves you liable to lawsuits. By contrast, the way that Google caches content and their rules surrounding it do not generally harm the copyright owner.
2) Google honors all robots.txt, no-archive meta-tags, and other indications that the author doesn't want the page to be cached. Is historious doing the same?
1) We do exclude robots now, yes.
2) historious doesn't spider websites, it only saves the pages the users give us. It's the same as a user deciding to make a backup of a webpage on their computer...
"It's the same as a user deciding to make a backup of a webpage on their computer..."
... and then publishing it on the Internet.
(This is not meant to be snarky or to imply opposition to your product at all. I think there is a meaningful difference between saving to a computer and saving to a web-accessible, apparently globally readable website.
Isn't it a users responsibility to obey copyright restrictions in this case, given that we never publish content unless the user does it? It's basically the same situation as hosting a website, if you upload and publish a copyrighted page, is the host responsible?
In my opinion, those two cases are not similar. I doubt that this type of automatic caching/publishing would have any protection under the DMCA safe-harbor laws unless you're making it clear to users what they're doing (I'm not a user of the service, so maybe you already are).
If I understand correctly, the users of your site are simply bookmarking pages. You are then caching it, storing it, and publishing it with a world-readable URL. There are many ways that you could provide the same experience to the user without making the cached page publicly accessible.
If you were to give users the option to make specific bookmarks world-readable - and you provided a disclaimer explaining that they should not make copyrighted material world-readable - then it might be different. But that's probably something you should discuss with an attorney.
Hmm, that's interesting... Another difference is that google is doing it by itself, whereas historious only stores pages that users specify and only publishes them when the user specifies it.
We'll have a chat with our lawyer regardless, thank you!
Example: http://cache.historious.net/cached/515865/