Hacker News new | ask | show | jobs
by macspoofing 482 days ago
Ooof. The core collections in Java are well understood to not be thread-safe by design, and this should have been noticed.

OP should go through the rest of the code and see if there are other places where collections are potentially operated by multiple threads.

>The easiest way to fix this was to wrap the TreeMap with Collections.synchronizedMap or switch to ConcurrentHashMap and sort on demand.

That will make the individual map operations thread-safe, but given that nobody thought of concurrency, are you sure series of operations are thread-safe? That is, are you sure the object that owns the tree-map is thread-safe. I wouldn't bet on it.

>Controversial Fix: Track visited nodes

Don't do that! The collection will still not be thread-safe and you'll just fail in some other more subtle way .. either now, or in the future (if/when the implementation changes in the next Java release).

>Sometimes, a detail oriented developer will notice the combination of threads and TreeMap, or even suggest to not use a TreeMap if ordered elements are not needed. Unfortunately, that didn’t happen in this case.

That's not a good take-away! OP, the problem is you violating the contract of the collection, which is clear that it isn't thread-safe. The problem ISN'T the side-effect of what happens when you violate the contract. If you change TreeMap to HashMap, it's still wrong (!!!), even though you may not get the side-effect of a high CPU utilization.

---------

When working with code that is operated on by multiple threads, the only surefire strategy I found was to to make every possible object immutable and limiting any object that could not be made immutable to small, self-contained and highly controlled sections. We rewrote one of the core modules following these principles, and it went from being a constant source of issues, to one of the most resilient sections of our codebase. Having these guidelines in place, also made code reviews much easier.

4 comments

> That will make the individual map operations thread-safe, but given that nobody thought of concurrency, are you sure series of operations are thread-safe? That is, are you sure the object that owns the tree-map is thread-safe. I wouldn't bet on it.

This is a very important point! And, in my experience, a common mistake by programmers who aren't great at concurrency.

Lets say you have a concurrent dynamic array. The array class is designed to be thread-safe for the explicit purpose that you want to share it between threads. You want to access element 10, but you want to be sure that you're not out of bounds. So you do this:

    if (array.size() > 10) {
        array.element(10).doSomething();
    }
It doesn't matter how thread-safe this array class is: these two operations combined are NOT thread-safe, because thread-safety of the class only means that `.size()` and `.element()` are on their own not going to cause any races. But it's entirely possible another thread removes elements in between you checking the size and accessing the element, at which point you'll (at best) get an out-of-bounds crash.

The way to fix it is to either use atomic methods on the class which does both (something like `.element_or_null()` or whatever), or to not bother with a concurrent dynamic array at all and instead just use regular one you guard with a mutex (so the mutex is held during both operations and whatever other operations other threads perform on the array).

A similar pattern (and overlooked very often) is with DB updates where you want to increment a number by 1, but split it into 2 roundtrips to the DB to get the existing value and set it to `value + 1`.

Concurrent transactions can screw it up, and you wouldn't notice until an escalation happens.

This kind of stuff is exactly why the original "thread-safe" collections in Java were deprecated in favor of the current one: locking on every operation means that a lot of trivial code "just works", but then things break without warning once you need consistency between consequent operations. And, at the same time, everybody is paying the locking penalty, even if they don't actually do concurrent access, or if they do their own locking.

Ironically, .NET repeated the same mistake with the same outcome - it's just less obvious because for .NET the switch from synchronized-by-default to unsync-by-default happened when generics were introduced, so type safety of the new collections was the part that devs generally paid the most attention to.

This is not a thread safety issue though, this is a transaction problem.
It’s absolutely a thread safety issue. It’s not a data race, but it’s a race condition and can very much cause unexpected behaviour e.g. out of bounds error even though on its face it has the correct preconditions.
Wrapping these 2 statements in a transaction will not solve this particular issue unless the transactions are run sequentially.
Unsynchronized, shared, mutable state are where data races happen. You need all three.

This means there are three ways to resolve this:

* Add synchronization with locks, etc.

* Don't share mutable access, e.g. single-owner model with channels.

* Make data immutable: an insight originally from purely functional languages. I believe Google invested quite heavily in this model with Guava.

Rust lets you choose which one of the three to give up, and it also statically prevents all three from happening at the same time.

> Don't share mutable access, e.g. single-owner model with channels.

That can still break if a reader makes multiple calls and implicitly assumes that the data structure at large hasn't been mutated between them.

This is equivalent to giving up a lock in between though, I think.
Maybe I misunderstood. I took the first bullet to refer to wrapping operations in a lock and the second to refer to single writer multiple reader.

At least in the single writer scenario it all breaks down if a lock free reader requires multiple items that are related to one another. It's an easy oversight to make though.

Fixing it sucks because you usually have to go and add locking to your previously fast and efficient lock free setup.

> At least in the single writer scenario it all breaks down if a lock free reader requires multiple items that are related to one another. It's an easy oversight to make though.

This is true -- generally the solution to this is to ask for everything in a single message.

It can also break if the read implementation has mutable side effects, e.g. rebalancing. Java HashMap used to have this causing an infinite loop. I haven’t looked recently but my guess is it still does.

https://mailinator.blogspot.com/2009/06/beautiful-race-condi...

‘Not thread safe’ really means ‘not thread safe’.

I mean, that's the "art of the deal". If you have a lock for all eternity, then you have a single owner, and are practically single-threaded.

The hard part is exactly all the combinations locks can be hold and given up, which will have a non-trivial number for more complex programs/data structures. That's where modeling software can help by iterating over all the possible cases.

This is basically what I thought: "Wait a second, is that data structure even thread-safe??" and then "Seems like you are just using the wrong data structure for your purpose.". Shoehorning with mutexes will usually lead to problems elsewhere and bottlenecks.

What you suggest to make objects immutable: Yes, and then one actually uses different data structures, which are made for being used as immutable data structures, making use of structural sharing, to avoid or limit an explosion in memory usage. So again its a case of "use the right data structures for the job". Using purely functional data structures makes one side of the problem simply disappear. The other side might not even come up in many cases: What if the threads depend on other versions of the data structure, which the other threads create? (need intermediate results of other threads). Then one has a real headache coming ones way, and probably needs again other data structures.

One additional idea I had, if one really wants to shoehorn it with the already used mutable data structure is, that one could write something that serializes the access attempts before they hit the data structure. That could also include grouping accesses into transactions and only running complete transactions. Then it makes me think whether it is close to implementing a database ...

So your solution to the the insanity that is the threading model was to reinvent the process model in it...

I don't intend to be mean, I just think the idea of independent execution in a shared memory environment is fundamentally flawed. And the effort in trying to get such follies working should have been put into better process models.

>So your solution to the the insanity that is the threading model was to reinvent the process model in it...

Where did you get that?

Immutability does not mean replicating the 'process model'. It means removing (limiting) an entire class of bugs from your multithreaded code. There will still be shared areas - and those should be controlled.