|
|
|
|
|
by mgunlogson
3494 days ago
|
|
Hmmm.... this might be an artifact of the python implementation? The insert speed should be roughly constant until the filter begins moving items into their alternate buckets(when one of the two buckets for that item is full). In my implementation this doesn't usually happen until well over 90% capacity. Insert speed will slow slightly as the filters fills up if the implementation uses a shortcut to determine the first empty bucket position by just checking if it's zero. Conceptually, if a bucket is empty you only need to check the first position before the insert. When the filter is half full you need to check ~%50 of bucket positions before an insert. In practice the bucket operations are so fast and buckets only hold 4 items each, I haven't noticed a difference. |
|
Since you've implemented one, maybe you can answer a couple more questions I have:
> Typically, a Cuckoo filter is identified by its fingerprint and bucket size. For example, a (2,4) Cuckoo filter stores 2 bit length fingerprints and each bucket in the Cuckoo hash table can store up to 4 fingerprints.
Wouldn't that fingerprint length normally be two bytes rather than two bits? The Python sample code given appears to use bytes.
Also, when you start doing deletion, isn't there a nonzero probability of causing false negatives because of fingerprint collisions? I guess a 2 byte fingerprint would make that probability acceptably small, but it doesn't seem to be zero. Correct?