|
|
|
|
|
by bloomer
3003 days ago
|
|
I wish this was something that people brought up more prominently when discussing cuckoo filters. They fail catastrophically. A bloom filter gradually fails as the number of items inserted increases and the false positive rate tends to one. But a cuckoo filter that fills up the buckets for a fingerprint has to reject an insert meaning that either you start to have false negatives or to guarantee that you don't have false negatives you have to immediately return true for any query, which is a false positive rate of exactly one with a sharp discontinuity in behavior from before the failed insert. I think the salesmanship of cuckoo filters has been a little overdone and in most applications a bloom filter is a better choice. |
|
In some cases, like the when there's a high likelihood of duplicate entries, this can represent a catastrophic failure mode. This is a good point, and when I get around to updating the tutorial, I'll make note of this.
Regarding the failure mode, an insertion can be knowingly rejected without mutating the filter, so the filter remains useful for subsequent insertions and all reads, retaining all its guarantees. The only loss is the rejection of the item - in our case we used this as signal to rebuild the filter at a larger size.