Hacker News new | ask | show | jobs
by antirez 2216 days ago
Looks like a solid hash function indeed, with a very simple round.

Q: Is there any reason you don't use the same mixing function for the residue of the string (the final tail that is not a full 8 bytes block)? You could copy the bytes you have into a 8 byte array that is pre-padded with some pattern, and call the same mixing function again. Is this in order to save the distribution / avalanche properties in case the string is very short? In general would be nice to read some design note, if P and Q were obtained experimentally by checking for distribution or alike, if a different rotation length changes significantly the distribution properties and so forth.

1 comments

Great question, and thank you, Salvatore!

Some reasons for that are: different hashes for s and s\0 (without explicitly using length) to make it harder to create collisions, related to as you say better distribution/avalanche for short strings, also design-wise I like the asymmetry that strings not perfectly divisible by 8 will be treated a little differently. I just think that makes it better overall.

Thanks for the replies!
No probs. I think your idea of unifying the two loops is a good one.

A new version that uses the idea and overcomes the challenges would be great, and would simplify the code even more.

If you had a way to make a start on it, I welcome your PR.