| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by candiodari 3078 days ago

The whole point is to allow for comparison of large plaintext files that are stored by many users. Think of mp3s, or large avi files, or, say, a linux kernel image, or ...

> The server sees if it knows of the existence of records with the same hpFileTrunc value as the client's submission. If so, it returns them to the client.

And by doing this, provides a way for clients to verify if any user on the file storage server has this file. So if I wanted to know if your mozilla thunderbird has a mail I have the source to, I simply try to store this and get these duplicate records.

Most people would consider this extremely unacceptable.

> The client then tries, for each record returned by the server, decrypting ckFile2 with the client's hFile value, potentially producing kFile. If this is successful, the client then decrypts cFile with kFile, producing pFile. Finally, it compares this pFile to the original. If it matches, a match has been found, and the client exits the loop. If not, (or if either of the two decryption steps failed), it continues to the next record the server returned. If there are no more records, the client instead submits the tuple (cFile, ckFile, hpFileTrunc) to the server, which stores it.

Why would the client have the keys to files stored by other users ?

Unless you mean that you can only deduplicate within a single client, in which case that's of much more limited use (and I might add, your encryption scheme is way more complex than it needs to be).

1 comments

XMPPwocky 3078 days ago

> And by doing this, provides a way for clients to verify if any user on the file storage server has this file. So if I wanted to know if your mozilla thunderbird has a mail I have the source to, I simply try to store this and get these duplicate records.

Yes. This is the reason you don't want this property (being able to deduplicate encrypted files)!

But you can provide it, while still providing meaningful security against other attacks.

The client has the keys to files stored by other users because the keys are the hashes of the plaintext, and the client can hash its own plaintext when it has the file.

(Note a trivial modification to this scheme, solely client-side, allows for certain files to be totally secure, with the cost of them being exempt from deduplication)

link

candiodari 3078 days ago

> The client has the keys to files stored by other users because the keys are the hashes of the plaintext

Personally I find only people explicitly authorized have the key to be the whole point of security. And you're suggesting this as a solution to the problem that organizations providing file storage could see what files you're storing.

Under this scheme, it wouldn't just be that organization, but everybody who is a client, that could see what files you're storing (or at least verify if you're storing a particular file or not)

So I find your assessment:

> But you can provide it, while still providing meaningful security against other attacks.

Very dubious indeed, especially given the context of securing centralized file storage, where the whole point would be to deny others access.

I mean it's a true statement, because you don't specify what "other attacks" are.

I posit that given that this system leaks the plaintext of your files I find it strictly worse than just giving Dropbox or Microsoft access to my files.

link

XMPPwocky 3077 days ago

> Under this scheme, it wouldn't just be that organization, but everybody who is a client, that could see what files you're storing (or at least verify if you're storing a particular file or not)

You can do this today, with Dropbox or whatever else- anything that does deduplication, if it saves bandwidth by not asking for files it already has.

You can't tell who is storing a particular file- only if anybody is. Does this leak information and impact privacy? Yes! But it still provides other useful properties.

If you have a copy of a file, you can see if anybody else does- a boolean value. (And if the server is malicious, it can tell who does (if it logs).) If you don't have a copy of a file, you can learn absolutely nothing about it.

So, for example, if a user uploads a, uh, personal image to the service- with Dropbox, in theory (they likely have strong organizational and technical controls against this sort of thing, mind you) if the server is malicious they can view that image.

With this scheme, the server can't.

On the other hand, if you, say, save a file containing only your social security number- or a similar low-entropy value- the server can crack the hash and decrypt that file. That's the price you pay for being able to deduplicate.

(Perhaps one could only deduplicate large files- thus handling the case of movies, music, Ubuntu ISOs, large system files, etc. To implement selective deduplication- if you want a file to not be deduped, replace all uses of its hash with, instead, a unique random value to identify the file. Server requires no modification.)

link