Hacker News new | ask | show | jobs
by tripletao 2949 days ago
The difference is that you may be willing to accept a much higher false-positive rate than your attacker can. This is the same idea behind the old "flip a coin, and then raise your hand if either the coin came up heads or you have [embarrassing problem]" method to statistically count everyone with the embarrassing problem, without disclosing anyone's status with certainty. That's the same property your truncated hash achieves.

A Bloom filter could also be designed accordingly. I'm guessing this post's grandparent was thinking of the filter's natural false-positive rate, or you could add deliberate noise.

1 comments

If had a service where I wanted people to use it and only remember explicitly opt-out, I wouldn't want any false positives to the "have already opted out" question.
Let's say my list has 10^4 members, and there are 10^9 people worldwide. If I design for a 10^-4 false positive rate, then a list constructed by reverse-engineering my algorithm (whether it's a Bloom filter or a truncated hash or anything else) will be 91% false positives, 9% true positives. That's not a huge improvement, but I could imagine applications where someone judged it worth the ~one customer I inconvenience.

This raises fun questions of what it means to disclose a fact, when you're disclosing it probabilistically. Let's say that you tell me the yes/no answer to a question you consider private. I then generate a uniform random number X on [0, 1], and disclose (("you told me yes") || (X >= a)) for some agreed constant a.

If a = 1, then I've almost surely just disclosed your secret. If a = 0, then I've almost surely disclosed nothing. At what value of a do you start to care? That's a really messy question, depending on the social consequences of the information being disclosed (what fraction of innocent candidates would you reject to make sure your child's tutor isn't on the list of clients of a psychologist known for treating pedophiles?), and the other public information about you and about my population that an attacker can fuse to make a stronger estimate.

I don't think privacy-through-false-positives is a terribly effective tool. It's just the only possible tool for creating privacy when your rule is public (whether deliberately or after a breach)--so it's interesting to think about places where it could have some benefit.