| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fatdog 3484 days ago
	Fascinating. Just reading about probabilistic data structures in general, and would like to know if there is an efficient general method for generating statistics about the FN rates. Are they related to ROC curves?

2 comments

jparkie 3484 days ago

I haven't looked into the relationship with ROC curves.

I'm not sure of an efficient general method for generating statistics. I know empirically testing the structures is the easy way. For Bloom Filters which evict old data, you can calculate the probability of FN by calculating the probability that a given element is a duplicate but reported as distinct.

link

throwawygybj 3484 days ago

Bloom and cuckoo filters are designed to have zero FN rate, at least classic ones. Caches and these filters are basically inversely related. One has no false negatives, the other no false positives.

In the vast majority of situations where false negatives are okay you're much better off just caching a hash of each object traditionally

link