| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ludde 1085 days ago

A M A Z I N G

Have been looking forward to this for years!

This is so much better than automatically doing dedup and the RAM overhead that entails.

Doing offline/RAM+in memory dedup size optimizations seem like a really good optimization path. In the spirit of also paying only what you use and not the rest.

Edit: What's the RAM overhead of this? Is it ~64B per 128kB deduped block or what's the magnitude of things?

2 comments

mlyle 1085 days ago

> Edit: What's the RAM overhead of this? Is it ~64B per 128kB deduped block or what's the magnitude of things?

No real memory impact. There's a regions table that uses 128k of memory per terabyte of total storage (and may be a bit more in the future). So for your 10 petabyte pool using deduping, you'd better have an extra gigabyte of RAM.

But erasing files can potentially be twice as expensive in IOPS, even if not deduped. They try to prevent this.

link

ludde 1085 days ago

If I manually deduplicate/copy a 1GB file, how many such offset/refcount tuples would be created in RAM? Just one for the whole file, or one per underlying 128kB block?

link

rincebrain 1085 days ago

n.b. I am not pjd, this is from memory and may be wrong.

The answer to that is messy, but basically, there's a table that should be kept in memory for "is there anything reflinked at all in this logical range on disk", and that covers large spans, so how many entries would depend on how contiguous your data logically was on disk; the actual precise mapping list per-vdev doesn't need to be kept continuously in memory, just the more coarse table, so that saves you a fair bit on memory requirements.

link

mlyle 1084 days ago

They don't go in ram. You have one 64 bit refcount number for every 64MB of storage irrespective of how much is deduplicated; this is a reduction factor of 8M, or 128k per terabyte stored.

This largely exists so you can avoid doing extra work on free.

link

ivanrt 1071 days ago

The show stopper for me is that this reflink will not persist across zfs send/receive, which is quite unfortunate.

link