Hacker News new | ask | show | jobs
by moonchrome 3992 days ago
>There's also a catch : if the hashcode of the string is 0, the hashcode will be recalculated every time (since the code assumes it has not been cached yet).

At least that part should be easy to fix by defining a hash function that returns numbers != 0 - even the article says they did it for JVM7 but it's gone in JVM8 - with no explanation ?

1 comments

The hash32 implementation in Java 7 was not intended to fix the case where the actual hash value was zero, nor did it have an impact on that case as the hash32 value was stored in a separate field.

It was made to decrease the number of hash value collisions in large data structures (hash maps and such). Its replacement is described here: http://openjdk.java.net/jeps/180 . Given that most Strings never end up in a large collection, allocating an extra 4 bytes for every one of them was a waste.

This post has been edited once for factual correctness.

It's hard to imagine a situation where the hash function would be so slow as to justify adding 4 bytes to every string.
If your strings are large, then calculating the hash will take time AND the extra 4 bytes for hash storage will be minimal extra overhead.

You can easily find good & bad cases for all of these string implementations. They all have tradeoffs.