Hacker News new | ask | show | jobs
by xKingfisher 956 days ago
I think it's just a very confusing, hard to use correctly behavior. And according to the article, very bug prone in implementations.

In the example given, the result is that both writes happened before both reads, which directly contradicts the source. There's a valid explanation for why it happens, but it's still paradoxical.

I remember at work a C++ standards committee member was giving examples of atomic and how to use them safely with different memory models, when someone pointed out his toy example for relaxed order was wrong. It took 5 people debating for a week to figure out what a safe/correct behavior would be. For a 10 line sample class.

As times your second question, the article recommends Hans Boehm's proposal to add a no-op conditional branch after the read. I guess it forces the load to be resolved and enforces a sequenced-before behavior on the individual threads. So at least one read must resolve before one write in the example.

1 comments

Yes, relaxed memory semantics are exceedingly difficult to reason about. Hard to believe this is news to the C++ standards committee.

I find the premise of this article confusing, and assume it must be because I’m missing something crucial—obviously Hans Boehm knows what he’s talking about. Why do (opt in!) relaxed semantics need to be fixed?

> relaxed memory semantics are exceedingly difficult to reason about

People say this but I don't understand how? They're basically just like ordinary variables. Except even better, because you can use them from multiple threads. And it's guaranteed that a given thread will only ever see values that were actually written by the program (like it's guaranteed the compiler won't introduce spurious writes during optimization). There's no ordering guarantee, just like there's no ordering guarantee across threads for ordinary variables. If relaxed atomics are too hard then aren't ordinary variables too hard?

You can certainly use them in a way that's confusing, but that's not because they're themselves complicated - it's because you're writing confusing code. You can do that in a single thread too, with ordinary variables. It's not specific to relaxed atomics.

In fact, in my experience, the most confusing atomics are the sequentially consistent ones! I never know when I need to use them in practice vs. acquire/release.

There's no ordering guarantee, just like there's no ordering guarantee across threads for ordinary variables.

This is what makes them hard to reason about. C++ makes “as if” ordering guarantees within a single thread trivial: If you can observe things happening out of the order in which your defined-behavior code sequenced them, you have a bug in your compiler or your CPU.

Relaxed ordering (on purpose) throws out all implicit sequencing guarantees when threads observe each other.

My confusion is that I believed this was all well-understood by implementers and programmers decades ago. That relaxed is harder than sequential to get right was as true on the Alpha as it is on modern ARMs. But this article seems to suggest what should be obvious consequences of relaxed semantics are unexpected & undesirable for C++!

> This is what makes them hard to reason about. [...]

I think we're speaking past each other.

You're implicitly assuming there is some piece of code using atomics, and you're saying it's harder to reason about its behavior when the atomics are made to be relaxed (versus, say, sequentially consistent). That's true enough, but it misses my point.

I'm coming at this from the opposite direction: I have certain kinds of atomics available, and then I'm using them to write the code. In such a scenario, it's quite easy to distinguish between "I might care about ordering" vs. "I don't care about ordering". In the first case, you probably shouldn't use relaxed atomics - good luck figuring out what to use, it may very well be tricky. If you insist on using relaxed atomics for those, it will definitely be tricky to get it right if it's at all possible, but that's when you should avoid relaxed atomics if at all possible. In the latter case, you should use relaxed atomics, and it's not tricky because you already know ordering doesn't matter.

Example: relaxed atomics are trivial to reason about in single-writer situations that aren't signaling any kind of event. Like when a worker is just trying to report progress to the foreground UI with minimal overhead. In this example, the writer (which is on a background thread) simply loads its progress indicator, adds 1, then stores back the variable. There isn't even a need for an atomic RMW, let alone any ordering. The worst case is the user will see "100% done" a few milliseconds too early, which is a non-issue (and already bound to happen due to rounding etc. anyway). Using relaxed atomics doesn't introduce any complexity in such a situation; it's basically exactly what you'd expect from ordinary variables without write-tearing.

You run into problems if you try to depend on the "100% progress" indicator to mean "everything is finished" (so you can, say, clean up data structures), but at that point you clearly need ordering and shouldn't use a relaxed atomic to begin with; you'd want release semantics (or stronger), and those are trickier.

> My confusion is that I believed this was all well-understood by implementers and programmers decades ago.

I definitely agree with this part. I didn't think in 2023 I would read that programmers are shocked that relaxed atomics can be reordered. That's... their entire point. It's like being shocked that compilers propagate constants during optimization.

> I didn't think in 2023 I would read that programmers are shocked that relaxed atomics can be reordered.

At my current and prior jobs, I've had to do the "C++ cleanup" job while porting some x86_64 code to ARM. The TSO model from x86 is just something I think a lot of people have assumed and haven't been hurt by (again because it's TSO). Most of the work I had to do is basically using all the SA and sanitizers I could, staring hard at the code to convince myself there wasn't some secret data dependency on some relaxed atomic.

"It's just a simple flag, so I'll use a relaxed atomic, nbd", meanwhile the same person is assuming another structure they've "published" (e.g. to a shared queue) will be visible is alarming.

Isn't the problem elsewhere here? The fact that people are using an API whose entire purpose they don't understand? "I passed a flag without understanding at all what it's supposed to do" doesn't seem specific to atomics at all.

This feels exactly like crossing an intersection without looking around despite noticing a sign saying "blind intersection", then being surprised when you get hit, then blaming it on the sign being "tricky" to reason about. There was nothing tricky about the logic... you just ignored it.

Context matters. In the context of a single thread, ordinary variable reads and writes happen as if they were in program order. So in that context, they are extremely natural. When shared across threads, reading and writing ordinary variables happens under a lock, so everything still appears as if in program order.

Atomics are used in a context of being shared across threads without locks. In that context, reasoning about them makes you actually think about memory ordering in a way that wasn't there with ordinary variables.

Sequentially consistent atomics are the easiest to reason about because all seq_cst atomic operations within a thread happen in program order (and are visible to other threads in that order). Relaxing that to acquire/release/relaxed loosens that, so it requires more thinking about what needs to be visible and in what order. It's only complicated because seq_cst has a performance cost, so you typically want to think about whether it's necessary or not for the particular use case. But reasoning about what is happening with an algorithm when using seq_cst atomics is always at least as easy as when using looser ordering.