Hacker News new | ask | show | jobs
by ScottBurson 3494 days ago
Yes, I understand why it slows down (though the additional details you've provided are helpful). My point was that slowing down is correctly described as a decrease, not an increase, in throughput.

Since you've implemented one, maybe you can answer a couple more questions I have:

> Typically, a Cuckoo filter is identified by its fingerprint and bucket size. For example, a (2,4) Cuckoo filter stores 2 bit length fingerprints and each bucket in the Cuckoo hash table can store up to 4 fingerprints.

Wouldn't that fingerprint length normally be two bytes rather than two bits? The Python sample code given appears to use bytes.

Also, when you start doing deletion, isn't there a nonzero probability of causing false negatives because of fingerprint collisions? I guess a 2 byte fingerprint would make that probability acceptably small, but it doesn't seem to be zero. Correct?

1 comments

The code should use bits, the point of the filter is to increase space efficiency so you need to pack everything together as close as possible. Having fingerprint sizes of only multiples of 8 bits doesn't make sense.

Also, from the quote, it's probably more accurate to say a Cuckoo filter is identified by it's fingerprint size, bucket size, and capacity. In most implementations the bucket size is fixed at 4, so you can define a filter with only fingerprint size and capacity.

And yes, deleting can cause false negatives but only in a roundabout way. Deleting will only cause false negatives if you delete something that wasn't added previously. Because the filter has false positives, sometimes it will tell you it's seen an item that it hasn't. If you delete one of these false positives it will cause a false negative. If you only delete items that have been added before, and in the case of duplicate items, added the same number of times, you won't get any false negatives.

Essentially if you saved a log of all the inserts then replayed it as deletions you would have a perfectly empty filter every time with no false negatives. I actually have a unit test that does this in my java Cuckoo filter :), it exorcised quite a few demons from the code.

If you're curious the unit test is sanityFillDeleteAllAndCheckABunchOfStuff() in https://github.com/MGunlogson/CuckooFilter4J/blob/master/src...