Hacker News new | ask | show | jobs
by michael_leachim 1094 days ago
cool, I enjoyed reading article and playing with visualizations. I liked it a lot.

How can I learn to design a hash function? It is possible to understand that stringSum is bad compared to murmur3 by evaluating it against test cases, but what properties make it bad. Is it summation compared to xoring in murmur3? I intuit that summation is kinda lossy, but ofc there is much more rigorous work put into it. It would be really cool to learn more about this. Thank you

3 comments

I'm actually not sure, to be honest. The post explicitly doesn't touch on the implementation of murmur3, and part of the reason is that while I can see what it's doing, I have no idea why it's doing it.

The initial announcement of murmur (https://tanjent.livejournal.com/756623.html) makes it seem like it's trial and error.

Murmur author here, it is a lot of trial and error but in general you're trying to pack as much "nonlinearity" as possible into as few instructions as possible. Multiplication is commutative, xor is commutative, but if you mix the two together you get functions that are strongly non-linear (in the algebra sense) because the mathematical "spaces" of integers and bits aren't connected together the same way. Ensuring that the nonlinearity applies to all the bits roughly equally is hard and requires trial and error.
The famous Bob Jenkins has lots of info on hash function design:

http://www.burtleburtle.net/bob/index.html

http://www.burtleburtle.net/bob/hash/index.html

Perhaps look at some of what paul khuong has written about umash; it is dense, but good, and includes good references.