Hacker News new | ask | show | jobs
by ludde 1079 days ago
If I manually deduplicate/copy a 1GB file, how many such offset/refcount tuples would be created in RAM? Just one for the whole file, or one per underlying 128kB block?
2 comments

n.b. I am not pjd, this is from memory and may be wrong.

The answer to that is messy, but basically, there's a table that should be kept in memory for "is there anything reflinked at all in this logical range on disk", and that covers large spans, so how many entries would depend on how contiguous your data logically was on disk; the actual precise mapping list per-vdev doesn't need to be kept continuously in memory, just the more coarse table, so that saves you a fair bit on memory requirements.

They don't go in ram. You have one 64 bit refcount number for every 64MB of storage irrespective of how much is deduplicated; this is a reduction factor of 8M, or 128k per terabyte stored.

This largely exists so you can avoid doing extra work on free.