| HN Mirror

Let's say my list has 10^4 members, and there are 10^9 people worldwide. If I design for a 10^-4 false positive rate, then a list constructed by reverse-engineering my algorithm (whether it's a Bloom filter or a truncated hash or anything else) will be 91% false positives, 9% true positives. That's not a huge improvement, but I could imagine applications where someone judged it worth the ~one customer I inconvenience.

This raises fun questions of what it means to disclose a fact, when you're disclosing it probabilistically. Let's say that you tell me the yes/no answer to a question you consider private. I then generate a uniform random number X on [0, 1], and disclose (("you told me yes") || (X >= a)) for some agreed constant a.

If a = 1, then I've almost surely just disclosed your secret. If a = 0, then I've almost surely disclosed nothing. At what value of a do you start to care? That's a really messy question, depending on the social consequences of the information being disclosed (what fraction of innocent candidates would you reject to make sure your child's tutor isn't on the list of clients of a psychologist known for treating pedophiles?), and the other public information about you and about my population that an attacker can fuse to make a stronger estimate.

I don't think privacy-through-false-positives is a terribly effective tool. It's just the only possible tool for creating privacy when your rule is public (whether deliberately or after a breach)--so it's interesting to think about places where it could have some benefit.