Hacker News new | ask | show | jobs
by mlyle 1085 days ago
> Edit: What's the RAM overhead of this? Is it ~64B per 128kB deduped block or what's the magnitude of things?

No real memory impact. There's a regions table that uses 128k of memory per terabyte of total storage (and may be a bit more in the future). So for your 10 petabyte pool using deduping, you'd better have an extra gigabyte of RAM.

But erasing files can potentially be twice as expensive in IOPS, even if not deduped. They try to prevent this.

1 comments

If I manually deduplicate/copy a 1GB file, how many such offset/refcount tuples would be created in RAM? Just one for the whole file, or one per underlying 128kB block?
n.b. I am not pjd, this is from memory and may be wrong.

The answer to that is messy, but basically, there's a table that should be kept in memory for "is there anything reflinked at all in this logical range on disk", and that covers large spans, so how many entries would depend on how contiguous your data logically was on disk; the actual precise mapping list per-vdev doesn't need to be kept continuously in memory, just the more coarse table, so that saves you a fair bit on memory requirements.

They don't go in ram. You have one 64 bit refcount number for every 64MB of storage irrespective of how much is deduplicated; this is a reduction factor of 8M, or 128k per terabyte stored.

This largely exists so you can avoid doing extra work on free.