|
> Under this scheme, it wouldn't just be that organization, but everybody who is a client, that could see what files you're storing (or at least verify if you're storing a particular file or not) You can do this today, with Dropbox or whatever else- anything that does deduplication, if it saves bandwidth by not asking for files it already has. You can't tell who is storing a particular file- only if anybody is. Does this leak information and impact privacy? Yes! But it still provides other useful properties. If you have a copy of a file, you can see if anybody else does- a boolean value. (And if the server is malicious, it can tell who does (if it logs).) If you don't have a copy of a file, you can learn absolutely nothing about it. So, for example, if a user uploads a, uh, personal image to the service- with Dropbox, in theory (they likely have strong organizational and technical controls against this sort of thing, mind you) if the server is malicious they can view that image. With this scheme, the server can't. On the other hand, if you, say, save a file containing only your social security number- or a similar low-entropy value- the server can crack the hash and decrypt that file. That's the price you pay for being able to deduplicate. (Perhaps one could only deduplicate large files- thus handling the case of movies, music, Ubuntu ISOs, large system files, etc. To implement selective deduplication- if you want a file to not be deduped, replace all uses of its hash with, instead, a unique random value to identify the file. Server requires no modification.) |