Hacker News new | ask | show | jobs
by idle_zealot 612 days ago
Is that true? Couldn't you run dedupe on blocks of encrypted files? I assume there would be fewer duplicate blocks compared to the cleartext, but if you have a bunch of blocks full of random bits there are bound to be repeats with a large enough number of blocks.
3 comments

If you can, you've effectively broken the encryption. Any scheme that takes random data and stores it in less space, when accounting for the overhead of the scheme itself, is astronomically unlikely to succeed by more than a few bits saved in any specific example (and on average across all such random streams cannot save space at all).
Indeed. Borg for example is e2e but able to dedupe.

My bookmark archive is 10TB but deduped on-disk size is 100GB because most files are the same across backups!

https://www.borgbackup.org/

That’s not the same thing at all.
Same thing as what?

Parent was asking about deduping encrypted data.

Someone said (wrongly) it’s impossible and I shared a popular project that does exactly that.

“Not backing up the same file twice” is not the same thing as deduplicating encrypted data, as encryption has no relevance there. You can do that with or without encryption.
Even 32 bytes of random data has an astronomically low chance to ever have a collision.