Hacker News new | ask | show | jobs
by dataflow 335 days ago
I haven't dug into the gory details either, but later they say:

> To fully generalize this into a robust data structure, we need:

> (1) A partitioning scheme that creates recoverable partitions with high probability

> (2) An iterative process that uses recovered values to unlock additional partitions

And they also say:

> With proper sizing (typically m > 1.22d cells), IBFs recover the full symmetric difference with very high probability.

It really doesn't sound like this is both exact and also running in linear time like XOR... right? Perhaps somehow the error is one-sided but then the time bound is probabilistic? If I'm missing something and they're truly maintaining both an absolute guarantee and an absolute time bound, this is mindblowing! But I don't get that impression?

1 comments

There is no absolute guarantee. You can have an arbitrarily large multiple and the decode can still fail when a set of entries exist that form a cycle, it just becomes quite unlikely as the overhead goes up.

One of the ways of hybridizing iblt and exact algebraic techniques like the minisketch library I link in my other post is to staple a small algebraic sketch to the iblt. If the iblt is successful you're done, if it gets stuck you use take the recovered elements out of the algebraic sketch and decode that. It's fast to decode the algebraic sketch in spite its O(n^2) behavior because it's small, and it'll always be successful if there are few enough elements (unlike the iblt).

Sadly this still doesn't give a guarantee since you might have more elements in a cycle than the size of the backup, but small cycles are more likely than big ones so there exists a range of sizes where it's more communications efficient than a larger iblt.