Hacker News new | ask | show | jobs
by callahad 5385 days ago
Something is twitching in the back of my mind about this. Sure, they can't look at the data based solely on the encrypted copy, but if they have a plaintext copy of a document of interest, they are able to determine which of their customers has that document, right?

Doesn't that diminish some of the privacy claims?

3 comments

For those interested in strict zero-knowledge and cross account deduplication we at SpiderOak wrote a post on the issue a while back;

https://spideroak.com/blog/20100827150530-why-spideroak-does...

Sure, but known-plaintext attacks are not the worst part. Consider this [found via http://www.mail-archive.com/cryptography@metzdowd.com/msg089...]: I take the standard Wordpress config.php [for your host], fill in your site and account name, fill in the one million most common database passwords, and ask the cloud provider whether any of these hashes exist.

Or: I create a form (say .doc) with a single field, CC#, and hope people store this. I then check the existence of 10^11 hashes to find (all customers'!) credit card numbers (for a specific issuer). This takes only a CPU-day! (The network is obviously slower.)

Eh. OK let's say instead of just SHA-256'ing the plaintext data to derive a key you do 50,000 bcrypt rounds. Then the client encrypts the plaintext, hashes the ciphertext, and sends the hash to the server. If it takes 0.5 s to generate a single bcrypt key, it would take about 1,500 years to find a single credit card number.
Sure, but then it takes 0.5s per file to check whether the server and client are in sync, too.
This paper, Secure Data Deduplication, has an overview and a security analysis: http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf