Hacker News new | ask | show | jobs
by karthikkolli 4302 days ago
We once maintained a hashmap with key and value as the instance of string to avoid duplication in a search application. Wouldn't that be more beneficial than keeping it in GC if the application uses more strings?

Edit: changed avoid deduplication to avoid duplication

2 comments

> We once maintained a hashmap with key and value as the instance of string

No love for hashset?

Why did you want to avoid deduplication? You can't even tell it's happened as it only works on the char[] which is internal to the string. Did you find it didn't work as expected.
It was in a typeahead search application built on 20GB of names. These names have common first names and last names which were stored as different strings. With deduplication, string memory was reduced to 20%

Will benchmark that application with +UseStringDeduplication

So what was the downside of deduplication? Why did you want to avoid it?
Sorry deduplication was a typo. Corrected
Ah right. The reason they're doing it in the GC rather than in the mutator threads is that it only has an impact on strings long lived enough to be evacuated. Short lived strings don't get deduplicated, and probably don't need to be. Without the GC I don't know how you'd automatically determine that it was a good idea to deduplicate.