Hacker News new | ask | show | jobs
by traceroute66 1180 days ago
> I would probably put the data on a ZFS volume with dedupe instead

But doesn't ZFS dedupe eat RAM for breakfast ? Double-digit GB RAM per TB data IIRC ?

3 comments

The DDT (deduplication table) used to require ~1GB RAM for every 1TB data written over the life of the file system. Deleting data from the file system wouldn't remove the dedupe references, you'd have to recreate the pool entirely and start over with a new DDT.

However, there are now special devices that can be used stored to store DDT. Typically this is done with two SSDs configured as a mirrored vdev for the DDT metadata. This reduces the overhead on memory, but does cost some performance and still has the same limitation that the DDT size can only be reduced by re-creating the pool.

Yes this is what I do, and I wouldn't have dedupe for all datasets, for instance in this case he wanted dedupe on his games, so I would just enable it for that dataset.

But I do run disk-wide compression with no problems and have done so on all my datasets for many years now, and it's been a tremendous space saver. Especially on machines where I have a lot of VMs/containers, it's not unusual for me to have a compression ratio of 2 on these with good old lz4, it will be interesting to see what damage zstd will do once I start experimenting with that.

+1 for compression! Unless you have very specific performance needs there's no reason to not to use compression. In fact TrueNAS (previously known as FreeNAS) has it enabled by default.

I've yet to compare lz4 to zstd myself, but I've read great things about it.

No, only for deduplication, which is optional and doesn't make sense for most workloads. And usage depends on record (like block) size and other factors. Usually 1GB per TB of deduplicated storage is the max. ZFS works on datasets for configs like this, so you can dedup a small "filesystem" of data but not all 40TB on a pool.

It will use lots of RAM for ARC (adaptive read cache) but that can be limited.

ZFS RAM usage is greatly overblown in my opinion and experience.

Agreed. People forget how much more RAM we have now versus when ZFS was developed. It's like old emacs swapping jokes.
It's usually along the lines of 1GB per TB. Some factors can affect that number but I've found it pretty accurate. Note that's 1GB of RAM that ends up wired to ZFS and never usable by the rest of the system.