Hacker News new | ask | show | jobs
by dmurray 4201 days ago
> The trick is that the hash of a phone number captured on Facebook will look just like the hash of the same phone number captured in a brick and mortar store, so the two companies can match the numbers without actually trading them.

This just isn't possible. How many cents would it cost to brute-force hash all legal American phone numbers?

2 comments

This is correct. As a zero-knowledge proof it is, sadly, trivially broken. (Not just in a theoretical sense. As you mention there just isn't enough entropy going into the hash. It's like asking for your age and gender, but only providing a hash to some other party so they can verify that you are who you say you are, without your divulging what information you were just given. That doesn't work - you would leak everything, because 1-100 (age) M/F (gender) only has 200 possible values. 200 hashes later your counterparty knows what you were asking to check.)

But that doesn't mean this couldn't be done properly.

There are actual zero-knowledge proofs. I liked this primer. You should read it!

http://blog.cryptographyengineering.com/2014/11/zero-knowled...

It is 100% possible for there to exist absolutely zero-knowledge proof in many instances. (Such as the one in the article.)

So, it could be possible for example (I don't know an algorithm) to check whether a phone number you were given is in someone else's set of phone numbers - without either your learning what the other's set is, or the other learning what phone number you're testing.

If there is, the 'someone else' would still have to do some rate limiting. Otherwise, once again, I can just run this algorithm for all possible phone numbers.
No, no. Read my link :) The whole point is that the definition of 'zero-knowledge' is that (if you use a zero knowledge interactive protocol) you learn no further knowledge.

You are right that the hash-passing is not zero-knowledge. Rather than just verify whether it's in a set, you've accidentally revealed a number. So if you have a set of phone numbers and I have 1 to check if it's in that set, then checking hashes doesn't work - I would end up divulging my number if I used it.

Intersecting a set of observed identifiers against another set is an issue Moxie dealt with on the TextSecures blog[0] (for phone numbers at that). He evaluated some ZKP algorithms and basically concluded this problem is hard or impractical today. I imagine an ad/referral network has many of the same real time constraints and scalability issues.

[0] https://whispersystems.org/blog/contact-discovery/

thanks - this was interesting. Some of the numbers are interesting. So, for example, he says to have ten million clients updating daily, he would need to sustain 40 MB 116 times every second. . . that's like 37120 megabit so if you pay $10 per megabit monthly, that's $371,200 per month. It's pretty bad. On the other hand you do have ten million users.

so the numbers are off, but they're not five orders of magnitude off. If we're going to service ten million users, we might have something of a hosting budget for it.

So while I believe the author that the current trade-offs aren't goods, clever academic mathematics might help in the future. Cryptography itself came from there.

Many other apps hash phone numbers to "suggest" security and privacy as well, e.g. Secret. Other apps don't even bother to create the hashes and send the plain text phone numbers, e.g. WhatsApp.
Why on earth would you have to brute force all numbers? You can do this optimally: just hash the numbers you see (or get requests for).
So that I can cold call someone who might be interested in my product, even though DataLogix only told me the hash of their phone number.
You've already hashed all numbers on your end, so it's trivial to figure this out (hash the hash?). Hopefully I'm not thinking about this wrong, but it does seem feasible.