Hacker News new | ask | show | jobs
by rincebrain 3356 days ago
I don't think it's dedup being "rushed" that's a problem - implementing dedup is often done "offline" (like with NTFS's implementation, or btrfs), so the data gets written as unique at first, and then eventually something runs through, finds duplicates, and rewrites history to point all the duplicate instances to one copy.

But ZFS deeply hardcodes assumptions which mean you don't get to rewrite history like that, so it gets to do it synchronously (and keep all the ever-growing data structures required for this in memory for all writing).

I don't think an arbitrarily larger amount of time or money behind it would have permitted a better implementation, short of a ZFS2 and an in-place migration tool.

2 comments

Dedup was rushed. This is supported by the original authors. I believe it was discussed in detail in a presentation by Matthew Ahrens. If I could remember the specific source I would link it, but it did not get the same level of testing and care as other features.
Dedup can be done right if the system has enough ram.
I don't know much about ZFS' deduplication, just heard that it requires a lot of memory, in a "hard minimum amount" way, to do it. This suggests, to me, that at least one design element of their deduplication engine is poor.

Efficient deduplication is design-wise a rather difficult problem with many trade-offs and issues which can blow your lower torso clean off when done wrong.

I don't think there is a system (beyond sheer coincidence, which seems rather unlikely given the complexity of the problem space) that can support good deduplication in an "added on later" way.

E.g. ext4 and btrfs have extent sharing which does work, but is completely inefficient (time). ZFS seems to be inefficient as well (space).

I'm off the cuff not aware of an open source deduplicating file system that does not have these issue. There are the deduplicating archivers (borg, restic, some others), but these are neither meant nor want to be general-purpose filesystems (although borg offers a ro FUSE FS with satisfactory performance).

Dragonflybsd's HAMMER filesystem seems to fit the bill nicely. There's even an option to limit the maximum amount of memory used for deduplication. Look up memlimit in the manual page: https://leaf.dragonflybsd.org/cgi/web-man?command=hammer&sec...
The Dedup heuristic I've heard is 2-3GB of ram per TB of raw storage.