Hacker News new | ask | show | jobs
by michaelmrose 1831 days ago
There is literally only one feature that uses massive amounts of memory. Online de duplication relies on keeping an in ram table of duplicated blocks. This means that more duplication you have the larger the table is.

FreeBSD Mastery: ZFS by Michael Lucas around pg 174

Deduplication Memory Needs ==========================

"For a rough-and-dirty approximation, you can assume that 1 TB of deduplicated data uses about 5 GB of RAM. You can more closely approximate memory needs for your particular data by looking at your data pool and doing some math. We recommend always doing the math and computing how much RAM your data needs, then using the most pessimistic result. If the math gives you a number above 5 GB, use your math. If not, assume 5 GB per terabyte."

https://www.tiltedwindmillpress.com/?product=fmzfs

This is not to say you need 5GB for every 1TB of data. It doesn't even mean you need 5GB of data for every 1TB for which you have enabled dedup it means you need approximately 5GB of data for each TB of data which is both duplicated and residing on a dataset for which you have enabled dedup. Because of the high memory cost of dedup which rises exactly in proportion to its utility its only useful in cases in which you can plan ahead for its requirements. 99% of users are unlikely to use dedup however this doesn't stop some, not you obvious, from promoting the idea that ZFS requires 5GB of memory per TB or some some absurd figure.

As an aside I really liked the book I found it easy to read and understand and very informative despite being focused on FreeBSD its mostly applicable to Linux as well.