Hacker News new | ask | show | jobs
by mgunlogson 3493 days ago
The code should use bits, the point of the filter is to increase space efficiency so you need to pack everything together as close as possible. Having fingerprint sizes of only multiples of 8 bits doesn't make sense.

Also, from the quote, it's probably more accurate to say a Cuckoo filter is identified by it's fingerprint size, bucket size, and capacity. In most implementations the bucket size is fixed at 4, so you can define a filter with only fingerprint size and capacity.

And yes, deleting can cause false negatives but only in a roundabout way. Deleting will only cause false negatives if you delete something that wasn't added previously. Because the filter has false positives, sometimes it will tell you it's seen an item that it hasn't. If you delete one of these false positives it will cause a false negative. If you only delete items that have been added before, and in the case of duplicate items, added the same number of times, you won't get any false negatives.

Essentially if you saved a log of all the inserts then replayed it as deletions you would have a perfectly empty filter every time with no false negatives. I actually have a unit test that does this in my java Cuckoo filter :), it exorcised quite a few demons from the code.

If you're curious the unit test is sanityFillDeleteAllAndCheckABunchOfStuff() in https://github.com/MGunlogson/CuckooFilter4J/blob/master/src...