| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nullc 336 days ago

iblt has low space efficiency for small sets, small elements, and low failure rates (and on that note is only probabilistic in its success).

We implemented https://github.com/bitcoin-core/minisketch which has optimal size efficiency-- N bits of state will always correctly recover when there are N or fewer bits of set difference, even when the set elements are small (like 32 bits, for example).

So for example you can you and I can each have sets of, say, ten thousand 32-bit elements which are identical except for 10 entries, and I can send you a 320 bit (32*10) sketch of my set and from that you can always determine the 10 (or fewer) differences. The same element and difference size with IBLT would likely take thousands of bits to have a low failure rate.

The downside is that the minisketch approach has quadratic decode complexity in the size of the set difference, but this is not a big deal when the number of differences is small by construction or thanks to recursive subdivision.

For cases where the differences are large iBLT eventually wins out-- the two ideas can also be hybridized in a variety of ways. E.g. using minisketch to make multi-element buckets in an iblt analogous to blocked bloom filter or the normal practice with cuckoo filters.

Another related scheme is cpisync which was used for many years by SKS key servers. It has communications efficiency like minisketch, but cubic decode costs.