Fascinating. Just reading about probabilistic data structures in general, and would like to know if there is an efficient general method for generating statistics about the FN rates. Are they related to ROC curves?
I haven't looked into the relationship with ROC curves.
I'm not sure of an efficient general method for generating statistics. I know empirically testing the structures is the easy way. For Bloom Filters which evict old data, you can calculate the probability of FN by calculating the probability that a given element is a duplicate but reported as distinct.
Bloom and cuckoo filters are designed to have zero FN rate, at least classic ones. Caches and these filters are basically inversely related. One has no false negatives, the other no false positives.
In the vast majority of situations where false negatives are okay you're much better off just caching a hash of each object traditionally
I'm not sure of an efficient general method for generating statistics. I know empirically testing the structures is the easy way. For Bloom Filters which evict old data, you can calculate the probability of FN by calculating the probability that a given element is a duplicate but reported as distinct.