| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by callahad 5431 days ago
	Something is twitching in the back of my mind about this. Sure, they can't look at the data based solely on the encrypted copy, but if they have a plaintext copy of a document of interest, they are able to determine which of their customers has that document, right? Doesn't that diminish some of the privacy claims?

3 comments

SODaniel 5431 days ago

For those interested in strict zero-knowledge and cross account deduplication we at SpiderOak wrote a post on the issue a while back;

https://spideroak.com/blog/20100827150530-why-spideroak-does...

link

JoachimSchipper 5431 days ago

Sure, but known-plaintext attacks are not the worst part. Consider this [found via http://www.mail-archive.com/cryptography@metzdowd.com/msg089...]: I take the standard Wordpress config.php [for your host], fill in your site and account name, fill in the one million most common database passwords, and ask the cloud provider whether any of these hashes exist.

Or: I create a form (say .doc) with a single field, CC#, and hope people store this. I then check the existence of 10^11 hashes to find (all customers'!) credit card numbers (for a specific issuer). This takes only a CPU-day! (The network is obviously slower.)

link

Locke1689 5431 days ago

Eh. OK let's say instead of just SHA-256'ing the plaintext data to derive a key you do 50,000 bcrypt rounds. Then the client encrypts the plaintext, hashes the ciphertext, and sends the hash to the server. If it takes 0.5 s to generate a single bcrypt key, it would take about 1,500 years to find a single credit card number.

link

JoachimSchipper 5431 days ago

Sure, but then it takes 0.5s per file to check whether the server and client are in sync, too.

link

hadronzoo 5431 days ago

This paper, Secure Data Deduplication, has an overview and a security analysis: http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf

link