| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by londons_explore 592 days ago

> offset the weight of all the unique entries in your dedup table

Didn't read the 7000 words... But isn't the dedup table in the form of a bunch of bloom filters so the whole dedup table can be stored with ~1 bit per block?

When you know there is likely a duplicate, you can create a table of blocks where there is a likely duplicate, and find all the duplicates in a single scan later.

That saves having massive amounts of accounting overhead storing any per-block metadata.