|
|
|
|
|
by houseabsolute
5757 days ago
|
|
I upvoted because I really hope to hear from someone who knows this stuff. Off the top of my head (as a non-expert), this sounds like a horrible hash function that is vulnerable to a plethora of attacks, but I don't know for sure. Why not just use the built-in hash function, or some function someone smarter than you has written? |
|
This is a horrible hash function for a number of reasons. You want your hash function to use every bit of information from the data you're keying on because that's more likely to give you a spread that doesn't contain a collision. Consider this list of keys:
aaz abz axz
Those three keys would collide, and for no reason. Even adding up each letter (another horrible hash function because it eliminates information about the position of the character, so "the" and "eht" would collide) would be better than this.
Information theory is important here; you want to preserve as much information from your key as possible. Check out http://en.wikipedia.org/wiki/Entropy_(information_theory) for a good discussion on this.
Also, there is almost no earthly reason to use a hash function with 11 buckets, and you certainly wouldn't want to evaluate your hash based on that. Assuming you'd have to search each bucket of 10,000 for your match, hashing it into 11 sections buys you very little time.
Also, there's no reason not to do something more complicated; assuming your key is in CPU cache because you're going to add the first and last letters, why not at least add up all the letters? You're wasting free CPU cycles after you've already loaded the key from memory.
Finally, you don't want output that "looks pretty random." You want output that sorts exactly evenly between buckets. He's nowhere close.