Hacker News new | ask | show | jobs
by danking00 751 days ago
I think it might help readers to include a narrative about an example application. Perhaps I’m in the minority but I tend to think of Bloom filters as a way to reliably know something isn’t in a set (e.g. so as to not run an expensive disk read). This data structure seems to view them the dual way: “this is maybe the right value for this key”.

I’ve seen that view work for visualizations like approximate CDFs and medians where I have some statement like “with probability p, the value differs from truth by less than e”. Is this data structure used in a similar way? My instinct is that visualizations having a low rate of being wrong is OK because the human will follow up that visualization with more tests. In the end you have lots of evidence supporting the conclusion.

1 comments

Ah, we need to clarify the language! The B-field will always return the correct value for an inserted key.

False positives are only returned for keys that have not been inserted. This is akin to a Bloom filter falsely returning that a key is in the set).

I second that "The B-field will always return the correct or Indeterminate value for an inserted key." before listing classes of errors would clarify it by a lot.