Hacker News new | ask | show | jobs
by miohtama 1038 days ago
Note that concurrency of containers not specific to Python any way.

For example, Java implements different versions of containers for single thread and multithread usage, because multithreaded containers have obvious performance penalty

https://docs.oracle.com/javase/8/docs/api/java/util/concurre...

2 comments

> For example, Java implements different versions of containers for single thread and multithread usage, because multithreaded containers have obvious performance penalty.

Very few codebases in Java are single-threaded; specialty frameworks like Netty are the exception not the rule.

Likewise, there are not different containers for single threaded and multithreaded usage; there are containers that have different strategies for dealing with multi-threaded usage.

Hashtable is the oldest and is notable for it still being present and still being fundamentally flawed. It will lock reads and writes, but does not describe a way to lock for a transaction - e.g. changes based on read data. As such, it fundamentally has race conditions you can't protect against.

Hashmap and the rest of the Java 1.2 collections API set a slightly better pattern - they don't internally try to maintain safety, but provide mechanisms like synchronizedMap() to let you hold the monitor for the length of your transaction.

However, this could only be so good, because the monitors in Java are pretty fundamentally broken as well. A monitor is both part of its public API ( e.g. "synchronized(foo) {...}" ) and part of its implementation (e.g. public synchronized void bar() { ... })". This means that external code can affect your internal operation if you leverage the monitor that you get by default through your "this" instance.

As such, synchronized set involves three monitors:

1. The monitor on the interface-implementing collection type itself, e.g. on the HashMap. This likely is never used.

2. The monitor on the object returned by the 'synchronizedXXX' wrapping method. This is used to protect transactional access, such as iterating through while removing items.

3. The monitor used as a mutex inside the object returned by the 'syncronizedXXX' wrapping method, protect the integrity of the collection data type if used by multithreaded code which does not hold monitor #2. The code may have a race condition, but it won't put the collection itself into an inconsistent structural state.

The 'synchronizedXXX'-returned wrapper objects are pretty expensive, and if you can you should just internalize those collections into business object that does any needed syncronization itself.

ConcurrentHashMap and the like are lockless, and are built with the idea that you can perform the changes needed through atomic operations rather than transactions. This isn't always true, but often is.

For a collection which is always held by a single thread, the atomic operation overhead may still cause a performance impact - after all, the atomic operations are still processor state synchronization points. It is also possible to beat ConcurrentHashMap with regular HashMap on certain usage in multithreaded environments, when you are properly protecting access to the HashMap yourself.

It might be challenging to find scenarios where ConcurrentHashMap doesn't beat the 'synchronizedMap()' wrapper, just because the implementation itself is really expensive.