Hacker News new | ask | show | jobs
by mmaunder 5385 days ago
I would argue that you can either have data de-duping or encryption, but not both.

If encryption is defined as: Transforming data so that only people with special knowledge can read it.

Then if you can compare a chunk of encrypted data against another chunk to determine the source data...

Well now you have very weak encryption because you could brute force it if you have a large enough repository of user files.

1 comments

While technically correct, that's not a practical observation. A memory bank storing your “large enough repository of user files” would consume the entire universe.

That said, people don't store random bitstrings. People store music on these shared storages--if I were a big media company I could find all the MP3's of songs I own floating around P2P networks, compute their encrypted forms and subpeona the storage company for user accounts storing any one of the the files. People have also been known to synchronize application data, including files with secret keys or passwords, which in this case effectively shares a hash of the password. That's better than dropbox, but still if the key + normal file variation doesn't have enough entropy an attacker could brute-force the contents of the file.

EDIT: Those are just potential real-world attacks I can think of on the spot; I'm sure there are plenty of others. While this is certainly (marginally) better than Dropbox, real security and data de-duplication are mutually exclusive.