Hacker News new | ask | show | jobs
by kachapopopow 480 days ago
Yep, ran into this way too many times. Performing concurrent operations on non thread-safe objects in java or generally in any language produces the most interesting bugs in the world.
5 comments

Which is why you manage atomic access to non-thread-safe objects yourself, or use a thread-safe version of them when using them across threads.

Multithreading errors are the worst to debug. In this case it's dead simple to identify at design time and warning flags should have gone up as soon as he started thinking about using any of the normal containers in a multithreaded environment.

Every time I think I'm sorta getting somewhere in my understanding of how to write code I see a comment like this that reminds me that the rabbithole is functionally infinite in both breadth and depth.

There's simply no straightforward default approach that won't have you running into and thinking through the most esoteric sounding problems. I guess that's half the fun!

It's not that bad. We just don't have the equivalent of GC for multi-threading yet, so the advice necessarily needs to be "just remember to take and release locks" (same as remembering to malloc and free).

Hopefully someone will invent something like STM [1] in the distant year of 2007 or so [2]. It has actual thread-safe data structures. Not just the current choice between wrong-answer-if-you-dont-lock and insane-crashing-if-you-dont-lock.

[1] https://www.adit.io/posts/2013-05-15-Locks,-Actors,-And-STM-...

[2] https://youtu.be/4caDLTfSa2Q?feature=shared

Rust takes pride in its 'fearless concurrency' (strict compile-time checks to ensure that locks or similar constructs are used for cross-thread data, alongside the usual channels and whatnot), while Go takes pride in its use of channels and goroutines for most tasks. Not everything is like the C/C++/C#/Java situation where synchronization constructs are divorced from the data they're responsible for.
Synchronization primitives in Go are just as divorced as elsewhere, sometimes even more so - it does have channels, but Goroutines cannot yield a value, forcing you to employ a separate storage location together with WaitGroup/Mutex/RWMutex (which, unlike Rust's RWLock, is separate too, although C# lets you model it to an extent). This results in community developing libraries like https://github.com/sourcegraph/conc which attempt to replicate Rust's Futures / C#'s Tasks.
Writing to a channel of size 1 feels a lot like a yeild to me, you can even do it in a loop.

A task is an abstraction over those primatives in any language. To my knowledge TBB task graph abstract over a threadpool using exactly that concept.

From what I've seen swift is the only language that properly handles concurrency. I'm taking another crack at rust but the fact that everyone uses tokio for anything parallel makes me feel like the language doesn't have great support for concurrency, it just has decent typing which isn't a surpise to anyone.

For C++, abseil’s thread annotations are quite nice for getting closer to the Rust style of locking. Of course, the Rust style is still much easier to understand and less manual.
None of them solve the problems associated with the general category of race conditions. You can trivially create live/dead locks with channel/message-passing, and rust only prevents data races, though ownership is definitely a step in the right direction.

(Well, go is not even memory safe under data races!)

Also, Java is one of the languages where you can just add `synchronized` as part of the method signature, and while this definitely doesn't solve the problem, I don't think "divorced from the data" is accurate.

Re: 'synchronized' and data. It is a good distinction to make because sync does indeed lock control, not data. With ACID transactions or STM, an atomic section will run as-if-sequentially, full stop, since the data is locked. With Java sync, you get 'no other thread is in these lines of code' and you have to hope that's enough for the system to run as-if-sequentially.
I'd love to get some examples of Rust's best-practice shared-mutable-state code. So far when I ask around here I get answers equivalent to "Rust guarantees that you aren't doing that."
It's not a perfect situation, but C# has some dedicated collection classes for concurrent use - https://learn.microsoft.com/en-us/dotnet/api/system.collecti.... There's still some footguns possible, but knowing "I should use these collections instead of the regular versions" is less error-prone than needing to take/release locks at every single use site.
Concurrent maps are generally worse in terms of being able to understand the system than either non-concurrent maps guarded by a lock, or a channel/actor model with single ownership. Data-parallel algorithms should also generally use map-reduce rather than writing into the same map concurrently.

I've written highly concurrent software with bog-standard hash maps plus channels. There are so many advantages to this style, such as events being linearized (and thus being easy to test against, log, etc).

> "just remember to take and release locks"

If only it were so easy.

STM is not going to ever be a production thing outside of purely functional languages.
That’s what everyone thought about affine types, too.
True! I've been following STM and HTM research work for a while, and it all seems quite niche unless all side effects are captured (which is something purely functional languages can do). There isn't a real path to scalability I think, which there was with affine types.

Optimistic concurrency in general is a useful design pattern in many cases, though.

The usual issue is code evolution over time, not the initial version which tends to be okay. You really want to have tooling strictly enforce invariants, and do so in a way that fails closed rather than open.

In other words, use Rust.

Tell that to inexperienced developers or making a massive single-thread project have multi-threaded capabilities.
I've been that developer making a single-threaded app multi-threaded. Best way to learn though!
Multi-threading - ain't nobody got time for that.
Yeah, our software politely waits for one customer to finish up with their GETs and POSTs before moving onto the next customer.

We have almost one '9' of uptime!

There are better ways than threading.
Yeah, like pretending you aren't
I ran into my share of concurrency bugs, but one thing I could never intentionally trigger was any kind of inconsistency stemming from removing a "volatile" modifier from a mutable field in Java. Maybe the JVM I tried this with was just too awesome.
Were you only testing on x86 or any other "total store order" architecture? If so, removing the volatile modifier has less of an impact.
I've universally found that even when I am convinced that I am OK with the consequences of sharing something that isn't synchronized, the actual outcome is something I wasn't expecting.
The only things that should be shared without synchronization are readonly objects where the initialization is somehow externally serialized with accessors, and atomic scalars -- C++ std::atomic, Java has something similar, etc.
This is kind of a hot take but I actually prefer debugging races in C/C++ for this reason. Yes, the language prescribes insane semantics (basically none) when it happens, but in practice you’ll get memory corruption or other noisy issues pretty often, and the fact that races are mostly illegal means you can write something like thread sanitizer without needing source code changes to indicate semantics. Meanwhile in Java you’ll never have UB but often you’ll have two fields be subtly out of sync and it’s a lot harder to track this kind of thing down.
Some (maybe most?) operations on Java Collections perform integrity checks to warn about such issues, for example map throwing ConcurrentModificationException
ConcurrentModificationException does not check threads, it triggers when it is already too late. It also triggers on the same thread if you remove while iterating an iterator
ConcurrentModificationException is typically thrown from an iterator when it detects that it’s been invalidated by a modification to the underlying collection. It’s harder to check for the case described in this article, which is about multiple threads calling put() concurrently on a non thread safe object.