|
|
|
|
|
by juancn
404 days ago
|
|
Another possible mechanism for doing GC at scale (a variation on Asynchronous Reconciliation in the article) in some file/object store, is doing a probabilistic mark and sweep using bloom filters. The mark phase can be done in parallel building many bloom filters for the files/objects found. Then the bloom filters are merged (or'ed together essentially) and then a parallel sweep phase can use the bloom filter to answer the question: is this file/object live? The bloom filter then answers either "No" with 100% certainty or "Maybe" with some probability p that depends on the parameters used for the bitset and the hash function family. |
|