Hacker News new | ask | show | jobs
by hyperpape 1461 days ago
G1's deduplication is nice, but note that G1's deduplication is a lot weaker than what String.intern does. G1 deduplicates the underlying byte array, but leaves separate strings (so s1 == s2 will evaluate false). So you still have an extra object header.

If you have (like one our applications did) millions of copies of the string "USA" in memory, that's many megabytes of memory that explicit deduplication can save that the garbage collector can't.

String.intern isn't the way, for all the reasons this post outlines, but just using G1 isn't the right approach either.

1 comments

IIRC all of the concurrent GCs can dedupe now. Not just G1.

Hopefully soon object headers will be negligible with progress from Lilliput though.

Strings have an object header, an int for the hashcode and a pointer to the array. Assuming a < 32 GB heap (so 4 byte pointers), that's 24 bytes for the string, even once the array is deduped. Lilliput is awesome, but an 8 byte header would only reduce that to 16 bytes.