| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by somejan 5134 days ago

Implemented correctly (with using e.g. scrypt as the hashing component, and making sure the hashes are large enough so that the chances are neglegible of an attacker finding a match to a different hash than that was originally generated from the users password), this scheme would be no less secure than the traditional way of storing one scrypt hash per user.

The only effective difference would be that the entire database would become a single unit instead of a collection of separate hashes. Both an attacker and your webapp need to carry this extra weight of a monolithic blob of un-dividable data. It probably won't really slow down an attacker trying to brute force it if he has the data, but it may be more difficult to get the data in the first place.

But if an attacker has access to, say 10% of the hashes, he'll still be able to brute force 10% of the user accounts with weak passwords.

A different way to get a similar result of requiring a huge amount of data to be able to start cracking, would be to treat the database like a huge bloom filter: treat the database as a huge bit array of (say) a petabyte, hash the user's password with a hundred different hash functions (but with scrypt-like slowness), and use those 100 hashes as 100 indexes into the array to set the corresponding 100 bits. To verify a password, create those same 100 hashes and check if all 100 bits are set. Now, if an attacker has access to a part of the database, he won't be able to determine with certainty of any of his guesses at the user's passwords are correct.

Yet a third way to accomplish the same goal: pre-generate a petabyte of random data. To hash a user's password, apply a standard scrypt, then based on the resulting hash, generate a 100 pseudorandom offsets into the petabyte of data. At each of those 100 offsets, read a few (say, 16) bytes from our petabyte of random data, and finally store a hash of (scrypt_result + huge_data[offset1] + huge_data[offset2] + ... + huge_data[offset100]). You'd still have one hash per user, but to check a hash you also need access to a huge block of random data. The block of data functions in a way as an additional system-wide salt.

Anyway, there are more ways to get to a similar result as the OP's proposal. I'm not sure if it buys any additional security or if it's just more of a hassle for the webapp implementing this, but at least it's fun to think about.