Hacker News new | ask | show | jobs
by peteretep 3267 days ago
I am confused by:

    > cryptographic hashes such as sha1,
    > though widely used therefore are not
    > very good choices
I thought SHA1 was fast, and that was a reason to not use it in applications where brute-forcing might be an issue.

    > [the more times you hash it the fewer
    > false positives]
That doesn't match my understanding of how any even slightly reasonable hash function should work, doubly so if it's uniformly distributed.
4 comments

SHA1 is fast compared to key-derivation functions so you do not want to use it to hash passwords.

However SHA1 is slow compared to most hash functions (especially but not solely non-cryptographic hashes) so you don't want to use it for hash-based collections like hash tables or bloom filters.

edit: here's a bit I saved from tptacek (sadly I didn't keep the link, only the content):

* If you need random fixed-sized URLs, generate UUIDs; don't tie them to content, which can (a) change and (b) be predicted.

* For error detection, CRC schemes aren't weak. Against adversaries, MD5 is weak. For offline file integrity checking, or user-timescale online checking, use SHA256; at the very minimum, use an algorithm that hasn't been broken.

* Do not ever use the MD5(password) password scheme. MD5 is much faster than Unix crypt; even conventional Unix crypt is at least salt'ed to defend against rainbow table attacks, and modern adaptive hashing can be tuned to make dictionary attacks infeasable.

* MD5 is too slow for in-memory hash tables; I cringe when I see people use it. You're probably just hashing a string: use Torek's 31/37 hash. Otherwise, use Jenkins.

* PRNG design is hard. Just running MD5 over trivially small internal state doesn't yield a secure PRNG. Again, a problem other people solved that you have no business hacking on yourself, at least if your code matters.

* If you are concerned about collision attacks on your cryptosystem, which you should be if you're this guy and you're using MD5, use an algorithm that hasn't been broken; don't just jumble up one that already has. Kerckhoff's principal: look it up.

The bits you want are 3 and 4, but everything is good. Just sed s/MD5/SHA1/

The article is too informal and it might suggest that hash functions are iterated (taking the hash of the hash of the hash...), like in applications that need very expensive hash functions.

A Bloom filter needs multiple cheap and decently independent hashes of the input, which at least in principle could be computed in parallel. For example a FNV-family hash function, which has no arbitrary parameters nor a key input, could be applied to the concatenation of k different fixed integers with the input data to get k different, and approximately independent, hash values. The hash function would also have to be adjusted to obtain a uniform distribution over m distinct values rather than the usual much larger range.

I had added a paragraph with this idea but have since lost it accidentally :(
Bloom filters are data structure, in the same category as hashtable and trees.

They need a "hash" function to select an index for an item. A hash function in that context does not need to be cryptographic, it only needs to be quick and gives a good distribution of data.

SHA1 is slow, compared to a general purpose hash functions (typically murmurhash for bloom filters).

SHA1 lives in an inbetween world, it's not cryptographically secure, but it is slow when you don't actually care about cryptographic security. It has increasingly few modern uses.

Your second point I agree with, sounds like someone is using a very bad hash function there.