| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gopalv 2314 days ago

> Do you think Xor filter would be a better fit compared to Bloom here?

The 8byte key is the only scenario where you should consider XorPlus (i.e a 8 bytes mapped to a long).

The lookup properties of the Xor filter are better with that case, but the real question is whether you have an entire collection to start building the bitset or not.

The sketch production isn't incremental - there is no add(k) after building it once.

So you can't build add data once it is built, while the Bloom filters do support adding entries after the fact (in fact, it can add bloom filters into it, rather than sending all the new keys).

And both of those approaches are missing an unset operation.

2 comments

devj 2314 days ago

Sorry.. Forgot to mention that.

Yes insertion/deletion is required and the frequency may be higher. The exact usecase is that group membership is dependent on some conditions. We schedule this check every 15 minutes and based on it, we are adding/deleting the members.

link

cakoose 2314 days ago

Xor filters require all the members of the set be provided up front.

Bloom filters allow adding members, but not removing them.

Cuckoo filters allow removing members: https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf

And just for completeness, storing everything losslessly in a btree, radix tree, or hash table would probably be under 300 kB. (80 kB just for the IDs plus, say, an additional ~2x overhead.)

link

throwaway_pdp09 2314 days ago

Safe removal is only possible if you know it is already in filter. The 'you can delete from cuckoo filters' bit I feel is often oversold.

link

cakoose 2307 days ago

Oh wow, I only read the paper's abstract and was fooled. Thanks for the correction.

link

antman 2314 days ago

You can use a counting Bloom Filter that allows deleting. Depending on your update requirements you might also be ok with two bloom filters, one for that has the group memberships and one that has group removals.

link

im3w1l 2314 days ago

10k members every 15 minutes? Just rebuild it from scratch.

link

rbanffy 2314 days ago

Or use a Cuckoo fiter.

link

the8472 2313 days ago

> The sketch production isn't incremental - there is no add(k) after building it once.

Which means they can't be used for distributed set intersections, unions or cardinality estimations, a not insignificant side-benefit of bloom filters.

link