|
|
|
|
|
by ianamartin
2158 days ago
|
|
I don't know enough about bloom filters or the math involved to say whether or not you are correct. But what you're saying sounds a lot like many proponents of various flavors of NoSQL, Eventual Consistency, etc. I.e., it sounds like you are saying that lots of people have been using it for a long time, and it's just good enough. The actual correctness isn't really that big a deal. I might be crossing domain boundaries here and thinking up stuff that really doesn't matter. I mean, bloom filters have a known challenge. No one was ever supposed to use them except for statistical purposes anyway, and with a known error rate, it's fine to add into your models. But there's also something that feels a little weird about your point. It sounds like you're saying it's good enough for X, Y, and Z use cases; therefore it doesn't really matter how technically wrong it is. But again, I could be really off. |
|
For example, if you size a bloom filter a certain way, the "bad" math might tell you your false positive rate is 0.001%, while the "good" math might tell you your false positive rate is 0.001002%. It makes no difference. The error is orders of magnitude smaller than the number you get anyway. (I made those numbers up, but I've used Bloom filters and they should be in the ballpark for the sizes I've worked with). The bad math might be strictly speaking incorrect, but it's a good enough approximation for all practical purposes.
This is different from Eventual Consistency stuff, which has real practical implications from not having certain guarantees. Those limitations are real, and they have real consequences, not just a rounding error in a number.