Hacker News new | ask | show | jobs
by MaxGanzII 1555 days ago
I may be wrong, but I don't see this working in theory.

The basic problem is that for any physical core, there is no guarantee that any writes will ever be seen by any other core (unless the necessary extra magic is done to make this so).

(There's a matching problem when reading, too, in that in theory writes made by another core can be never seen by the reading core, even if they have reached the domain covered by cache coherency, unless the necessary extra magic, etc.)

So here with this code, in theory, the writes being made simply are never seen by another core, and so it doesn't work.

As it is, in practise, writes typically make their way to other physical cores in a reasonably short time, whether or not the extra magic is performed; but this is not guaranteed. Also of course the ordering of the emergence of those writes is not guaranteed (although on some platforms, additional guarantees are provided - Intel is particular generous in this regard, and performance suffers because of it).

3 comments

> The basic problem is that for any physical core, there is no guarantee that any writes will ever be seen by any other core (unless the necessary extra magic is done to make this so).

They're using atomic read/writes with sequential-consistency.

This means that the compiler will automatically put memory-barriers in the appropriate locations to guarantee sequential-consistency, but probably at a significant performance cost.

The implementation is likely correct, but its the slowest performance available (seq-cst is the simplest / most braindead atomic operation, but slowest because you probably don't need all those memory barriers).

I'd expect that Acq-release style barriers is possible for this use case, which would be faster on x86, POWER, and ARM systems.

> They're using atomic read/writes with sequential-consistency.

In the C11 code? I'm not up to speed with that version of the spec, but unless there is behaviour invoked by the use of _Atomic, the code looks to me to be performing normal, non-atomic read/writes.

Another reply says there are full barriers being automatically used, but that doesn't address the problem I've described.

> but unless there is behaviour invoked by the use of _Atomic

That is the behavior invoked by "atomic".

The general gist is that "manual" read/write barriers were too difficult to think about and led to bugs. Instead of having the programmer put the read/write barriers in the correct locations, modern compilers put a type-specification on variables... and then its the compiler's job to put the barriers in the correct location.

This turned out to be necessary, because the compiler's optimizer kept rearranging code (!!!) around memory barriers anyway. So the "compiler" needs to participate in any of the barriers, especially at higher levels of optimization (-O3 rearranges a lot of code). If the compiler needs to participate, you might as well have the compiler handle the placement of those barriers.

"Atomic" variables by default will be sequentially-consistent (ie: the _maximum_ use of memory barriers for _maximum_ levels of consistency).

You're talking here about memory barriers, but I was asking if atomic writes are in use.

Atomic writes have nothing to do with memory barriers. On modern platforms atomic writes are available without memory barriers of any kind.

Memory barriers do not cause events to become visible.

Only atomic writes do this; and so the question is not whether barriers are in use due to _Atomic, but whether atomic writes are in use due to the use of _Atomic.

https://en.cppreference.com/w/c/language/atomic

_Atomic types can be read-modified-written atomically (if using a compound-assignment operator), or the postincrement/pre-increment operations.

In these circumstances, the memory-barriers used for those atomic-operations will be of the sequentially-consistent memory model.

------

So yes, that "++*barrier" operation is read-modify-write atomically AND with memory-barriers in sequential-consistency style

-----

I don't believe that "atomics" are enough to solve this problem. At a minimum, I expect acq-consume to be required (not that acq-consume model even works in practice... so acq-release for today's compilers). That's why I'm still talking memory model / fences here.

Not only is an atomic operation required, but so too is the "publishing" of this information needing to happen in the proper order. Fortunately, C11's _Atomic implementation solves both issues.

> So yes, that "++*barrier" operation is read-modify-write atomically AND with memory-barriers in sequential-consistency style

The quoted material does not describe this assertion.

This is true in a strict sense, but essentially no concurrent code works if you don't assume that writes become eventually visible to all other threads.

Consider something basic like implementing a mutex: once a thread unlocks a mutex, it is possible in theory that no other thread _ever_ sees the mutex as unlocked, and spin on it forever, but such a system would be useless.

In practice, general purpose CPUs have writes become visible as fast as possible.

You won't find this behavior defined strictly in memory models, because how can you? These talk only about ordering and eschew things like "global clocks" and even if you had such a clock, how are you going to put a numerical bound on the delay?

Most will leave it as a quality of implementation issue and specify that writes become visible "eventually" if they say anything at all.

There is an implied seq-cst barrier in the atomic INC.
A barrier, of any kind, makes no difference here.

Barriers only influence the order in which events become visible if and when they DO become visible.

Barriers say nothing about whether or not events actually do become visible; and it may be that they never do (if the necessary magic is not performed, etc).

You mention an atomic increment. I don't see atomic increments this in the C11 code, but I'm not up to speed with that version of the specification.

(Note that an atomic increment will solve the write-side problem, but not the read-side problem. The reader must still use a read barrier; but if there's an implicit full barrier, then that's being done.)

The fact that another thread sees the incremented value implies that that thread will see all stores that happened-before that incremented value.

A fence or atomic operation guarantee that a store will be visible globally in a finite amount of time

Edit: that's true for C11 _Atomic as well (I had to double check), I.e. x++ if x is an _Atomic int is both atomic and seqcst

> The fact that another thread sees the incremented value implies that that thread will see all stores that happened-before that incremented value.

Firstly, if the writer has used no barriers, writes can emerge in any order (and may never emerge at all, anyway, even if barriers were used). Secondly, if the reader has not used barriers, then even if writes have been written in some given order, the reader will by no means observe that order.

> A fence or atomic operation guarantee that a store will be visible globally in a finite amount of time

A fence does NOT guarantee this. A fence controls and only only controls order of events. It has NO influence on whether events actually emerge. An atomic operation does, but this only solves the write-side of the problem; the reader still needs to issue a read barrier to guarantee seeing atomically written values.

Sure but inc in addition to the fence also issues the required reads and writes.
An atomic write will need to be issued, and on the face of it, that C11 code is not issuing an atomic write. I could be wrong, but I would expect a particular function or macro to be used for an atomic write, and when it is not, you have normal writes.