|
|
|
|
|
by vertex-four
4625 days ago
|
|
Bitcoin uses a cryptographic hash function, SHA256. There are, at present, no known collisions with SHA256. This question is about non-cryptographic hash functions, as are commonly used in hash tables[1] which implement associative arrays, or sharding databases based on the hash of the record key. Non-cryptographic hash functions are simply not in the same league as cryptographic ones; cryptographic hash functions are optimized to make it infeasible for an attacker to generate a different message with the same hash, while a non-cryptographic hash function generally simply has to have uniformly random output over its keyspace. [1] http://en.wikipedia.org/wiki/Hash_table |
|
The original article did not descend into that obvious area of research. I see no particular reason why a hash algo that has the worst randomness shoved into the least significant byte (which I simply don't care about) might be an inherent result of smooshing the best randomness into its most significant nibble, which I do care very much about. Given the likely use case for a sharding hash, a smart hash designer would make sure that most of his effort is put into smooth distribution in MSB and perhaps totally ignore the LSB for a given amount of latency / computation / electrical power. After all, the actual users are more likely to hash based on the first 4 bits than the first 24 bits. Although you'll always run into people who think its funny to pull their shard subset out of the hash using the LSB (why?) or some random byte in the middle (why?)
I think MSB for shard key comes out of the tradition in the really olden days of sharding based on raw unhashed data. Sometimes thats random enough such that the MSB of the data makes an excellent shard key and hashing would just slow things down for a minimal gain, even today.