| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by blueplanet 3998 days ago
	Can someone explain to me why using a cached value for hashCode in an immutable string is a bad thing?

4 comments

abollaert 3998 days ago

I don't think he refers to the caching being a bad thing, but that it probably does not affect performance that much, since most Strings are likely to be small (and the number of times hashcode is called is also likely to be small).

There's also a catch : if the hashcode of the string is 0, the hashcode will be recalculated every time (since the code assumes it has not been cached yet).

link

moonchrome 3998 days ago

>There's also a catch : if the hashcode of the string is 0, the hashcode will be recalculated every time (since the code assumes it has not been cached yet).

At least that part should be easy to fix by defining a hash function that returns numbers != 0 - even the article says they did it for JVM7 but it's gone in JVM8 - with no explanation ?

link

Ironlink 3998 days ago

The hash32 implementation in Java 7 was not intended to fix the case where the actual hash value was zero, nor did it have an impact on that case as the hash32 value was stored in a separate field.

It was made to decrease the number of hash value collisions in large data structures (hash maps and such). Its replacement is described here: http://openjdk.java.net/jeps/180 . Given that most Strings never end up in a large collection, allocating an extra 4 bytes for every one of them was a waste.

This post has been edited once for factual correctness.

link

mtdewcmu 3998 days ago

It's hard to imagine a situation where the hash function would be so slow as to justify adding 4 bytes to every string.

link

joosters 3998 days ago

If your strings are large, then calculating the hash will take time AND the extra 4 bytes for hash storage will be minimal extra overhead.

You can easily find good & bad cases for all of these string implementations. They all have tradeoffs.

link

brandonbloom 3998 days ago

Can just bit-or 1.

link

wolf550e 3998 days ago

this halves the number of values. doing if (hashcode == 0) hashcode = 1; removes only one value.

link

nvarsj 3998 days ago

I assume it's because most of the standard Java collections (HashMap, HashSet) actually cache the hash codes themselves. So caching it again inside the object is a bit redundant.

However other data structures (notably the ones in guava), don't do this caching - which does hurt performance for any sort of heavy writes and/or complex object.

link

vbezhenar 3998 days ago

It's certainly not a bad thing, why do you think it might be?

link

mnw21cam 3998 days ago

It can be a bad thing if all programmers assume that the caching works, and it doesn't. That's where the bit in the article mentions strings that produce a hash code of zero. Zero is used as a sentinel for "hash code not yet calculated", so for those particular strings the code is recalculated every single time.

link

vbezhenar 3998 days ago

That's not worse than not caching at al. Hashing implementation usually is a black box. Actually simple if (h == 0) h = 0xdeadbeef would probably prevent that issue from happening (if that should be considered a real issue at all).

link

thescrewdriver 3998 days ago

The only negative was a small memory overhead.

link