Hacker News new | ask | show | jobs
by gigatexal 3356 days ago
Dedup can be done right if the system has enough ram.
2 comments

I don't know much about ZFS' deduplication, just heard that it requires a lot of memory, in a "hard minimum amount" way, to do it. This suggests, to me, that at least one design element of their deduplication engine is poor.

Efficient deduplication is design-wise a rather difficult problem with many trade-offs and issues which can blow your lower torso clean off when done wrong.

I don't think there is a system (beyond sheer coincidence, which seems rather unlikely given the complexity of the problem space) that can support good deduplication in an "added on later" way.

E.g. ext4 and btrfs have extent sharing which does work, but is completely inefficient (time). ZFS seems to be inefficient as well (space).

I'm off the cuff not aware of an open source deduplicating file system that does not have these issue. There are the deduplicating archivers (borg, restic, some others), but these are neither meant nor want to be general-purpose filesystems (although borg offers a ro FUSE FS with satisfactory performance).

Dragonflybsd's HAMMER filesystem seems to fit the bill nicely. There's even an option to limit the maximum amount of memory used for deduplication. Look up memlimit in the manual page: https://leaf.dragonflybsd.org/cgi/web-man?command=hammer&sec...
The Dedup heuristic I've heard is 2-3GB of ram per TB of raw storage.