Hacker News new | ask | show | jobs
Show HN: A concurrent thread-safe hash map implemented in C++ (github.com)
17 points by kshk123 3438 days ago
4 comments

I feel that using a linked list as the underlying structure combined with fine grained locking is the wrong direction to go. What is the scenario in which I would use this collection over an unordered_map and some locking scheme? A linked list is a very (very very) poor performer and I would only use a linked list for implementing a collection which is synchronized based on CAS operations instead of locking. Since your implementation relies on linked lists and fine grained locking, I have a hard time imagining a scenario where I would gain utility. I would start by wrapping the std unordered_map and shared_mutex, testing and measuring the performance as a baseline and only then creating an alternative which could compete. If you want to make it interesting, at least make the synchronization lock free and gain some advantage from using a linked list.
Thanks for your feedback. The linked_lists can definitely be replaced, and I am working on finding a suitable data structure to maintain the individual buckets. However, the "unordered_map" with one global mutex will not achieve what I am trying to do, i.e., allow multiple threads to simultaneously write into the same hash_map, unless two (or more) of them collide on the same bucket at the same time.
It's true that a single global mutex will not provide such locking granularity, but unordered_map does provide bucket api allowing you to find a bucket by key. This means that for a pair p, you can find the relevant bucket at index i, lock mutex i and insert to the map. The only problem you need to deal with is rehashing which can be solved by wrapping the hash function with a modulo operation and playing around with the max_load_factor. (I could write a short sample later, it would be a very thin wrapper.)
A few issues with this implementation:

Is it really nessecary to allocate all of the hash buckets up front? Why not initialize them to null and then allocate them as you need?

Why not use some kind of managed container like std vector to store the hash buckets? This removes the need to do explicit memory management.

Why not use std forward_list instead of rolling your own linked list? Again, this will reduce the amount of code and save you from having to do manual memory management

Thank you very much for your review comments. Please find my answers below. All the hash buckets are allocated upfront so that there is no need for a global map. If I want to allocate the hash buckets later, there will be a need to take a hashmap-level-lock, which will hamper the concurrency of the map. Yes, using the std containers will make life easy. I was trying to have my own data structure in place, but there is no particular reason for it. I might move to the std containers in later version.
Curious if there are any tests.
Yes, the HastTest.cpp in the "src" directory has a number of test cases.
this doesnt work for collisions
Can you please elaborate, if there is a bug I would definitely like to know about it and fix it.