Hacker News new | ask | show | jobs
by codedokode 3001 days ago
> Facebook proposed using a common computer science technique called "hashing" to match individuals who existed in both sets

But Facebook has the source data from which hashes were generated so they can alway reverse them.

Furthermore, if you take a hash from name and surname, it can be easily reversed because the set of names and surnames is relatively small.

1 comments

Not if you use salt.
Even if you salt the hashes.

The problem is that the number of inputs is limited and it's trivial to enumerate over the input values. Let's take a contrived example: We have the data of a small, entirely made-up island where only two families live, so we have two surnames. Let's name them Foo and Bar. Now, they have an entirely funny tradition, they all get first names based on the order in which they were born (1). So we have Firstborn, Secondborn. Let's also, for simplicity assume that each couple gets exactly two children. That gives us the following 4 possible combinations of names:

    Firstborn Foo
    Firstborn Bar
    Secondborn Foo
    Secondborn Bar
Let's assume that there are 10 million of those people and we hash their names with a salt, that gives us 10 million unique hashes. But to break each hash, we only need to try at most 4 times, that's 40 million tries. Hashing speed varies from hash to hash and the hardware, but good old md5 easily achieved a few million hashes per second on a stock CPU in 2012. GPUs are usually around two orders of magnitude faster (2). So in the worst case, your desktop PC could break all those 40 million hashes in a few seconds without breaking a sweat. Better hashes are slower, but with such a limited input space, even the best hashes are breakable.

So no, salt's won't save you here.

(1) This is not entirely fictitious: https://nowiknow.com/wayan-balik/

(2) See for example the hashcat benchmarks http://thepasswordproject.com/oclhashcat_benchmarking and https://blog.codinghorror.com/speed-hashing/

Doesn't it only apply if the salt is known or constant? If I passed you hash(x) + aes(random_salt), would this attack work?
The salt must never be a constant, the entire point of a salt is that two identical inputs do not hash to the same value. However, it must be stored alongside the hash, so that you can later verify the hashed value. Many modern password hash functions (bcrypt for example) do store the salt as part of the hash.
That's not the point. Salt is constant, but different for each entry. They can encrypt the salt and when they share it with hospitals, those can't reverse the hash but FB can. Doesn't it solve the problem?
If you don't store the salt, then you might as well generate random numbers.