Hacker News new | ask | show | jobs
by eemil 612 days ago
One downside to encryption, is it prevents the server operator from doing any deduplication (file or block level) on their end.

Maybe one reason why cloud providers aren't pushing it that heavily. Especially the big players, since more data = more duplication = more efficient deduplication.

3 comments

Double edged sword. Mega Upload were doing it and it was argued (successfully) in court that they therefore had knowledge of what they were hosting.
That's fine. We pay for storage. I'll pay extra to not have the host spy, sell, etc. my data.

Deduplication only really shines if most data is pirated copy data. In reality the vast majority of data is in fine details of high resolution photos and videos of completely uncorrelated images.

Is that true? Couldn't you run dedupe on blocks of encrypted files? I assume there would be fewer duplicate blocks compared to the cleartext, but if you have a bunch of blocks full of random bits there are bound to be repeats with a large enough number of blocks.
If you can, you've effectively broken the encryption. Any scheme that takes random data and stores it in less space, when accounting for the overhead of the scheme itself, is astronomically unlikely to succeed by more than a few bits saved in any specific example (and on average across all such random streams cannot save space at all).
Indeed. Borg for example is e2e but able to dedupe.

My bookmark archive is 10TB but deduped on-disk size is 100GB because most files are the same across backups!

https://www.borgbackup.org/

That’s not the same thing at all.
Same thing as what?

Parent was asking about deduping encrypted data.

Someone said (wrongly) it’s impossible and I shared a popular project that does exactly that.

“Not backing up the same file twice” is not the same thing as deduplicating encrypted data, as encryption has no relevance there. You can do that with or without encryption.
Even 32 bytes of random data has an astronomically low chance to ever have a collision.