Hacker News new | ask | show | jobs
by ryao 3693 days ago
ZFS' data desuplication requires very little memory. However, it will check every new record write under it with every other record write. The only way to do this in a performant way is to lean on cache. Without sufficient cache, you degrade to performing 3 random sequential IOs, which peforms terribly. The system will continue to run, but it would be slow.

As far as I know, there is no way to implement online deduplication with constant RAM usage without performing poorly as things scale or playing Schrödinger's cat with whether data that should deduplicate is subject to deduplication. Offline data deduplication might work, but it would be performance crippling ZFS' data integrity guarentees.

If HAMMER has online data deduplication that is performant with constant ram, they likely made a sacrifice elsewhere to get it. My guess is that it misses cases, such that while you would expect unique records to be written once, they can be written multiple times.