Hacker News new | ask | show | jobs
by dragontamer 2401 days ago
That is a good point: the x86 model is "stronger" than acquire-release. Which is probably why it took so long for acquire-release to become beneficial. Any x86 coder who codes in acquire-release will not see any performance benefit on x86, because x86 implements "stronger guarantees" at the hardware level.

Well, that is until you enable gcc -O3 optimizations, which will move memory around, merge variables together, and other such optimizations that will follow the acquire-release model instead of TSO. Remember that the compiler has to consider the memory-consistency model between registers and RAM (when are registers holding "stale" data and need to be re-read from RAM?)

-------

The thing is, acquire-release is becoming far more popular and is the golden-standard that C++11 has more or less settled upon. C++11, ARM, POWER9, CUDA, OpenCL have moved onto acquire-release semantics for their memory model.

Next generation PCIe 5.0, CXL, OpenCAPI, are all looking at extending cache-coherence out to I/O devices such as NVMe flash and GPUs / Coprocessors. I'm betting that Acquire/release will become more popular in the coming years. TSO is too "strict" in practice, people actually want their reads-and-writes to "float" out of order with each other in most cases, especially when you're talking about a PCIe-pipe that takes 5-microseconds (20,000 clock-ticks!!) to communicate over.

1 comments

> That is a good point: the x86 model is "stronger" than acquire-release. Which is probably why it took so long for acquire-release to become beneficial. Any x86 coder who codes in acquire-release will not see any performance benefit on x86, because x86 implements "stronger guarantees" at the hardware level.

Yes, in a way it's a race to the bottom; code that works on TSO hw works on acquire-release hw, but not the other way around. There's only two ways to combat this race: education, and using concurrency libraries written by people who know what they're doing.

> acquire-release is becoming far more popular and is the golden-standard that C++11 has more or less settled upon

Hmm, how come? C++11 supports many different models, relaxed, acquire/release, and sequential consistency, with sequential consistency being the default for atomic variables. Now, acquire/release looks like a decent compromise between ease of hw implementation and programming complexity, but AFAICS it's not the anointed one true model.

To some extent I think that's a failing of the C++11 model. Instead of choosing one (sane) model, they made people choose between an array of models with subtle semantics. That's what the recent formal Linux kernel model did, although that's not ideal either, with the requirement to not be too different from the previous informal description and boatloads of legacy code. See http://www0.cs.ucl.ac.uk/staff/j.alglave/papers/asplos18.pdf

In general, it seems to me that progress is being made in formal memory models, and I hope that in some years time there will be some kind of synthesis giving us a model that is both reasonably easy to implement in hw with good performance, easy enough to reason about, as well as formally provable. We'll see.

> Hmm, how come? C++11 supports many different models, relaxed, acquire/release, and sequential consistency, with sequential consistency being the default for atomic variables. Now, acquire/release looks like a decent compromise between ease of hw implementation and programming complexity, but AFAICS it's not the anointed one true model.

Well, nothing will ever be "officially" blessed as the one true model. As the saying goes: we programmers are like cats, we all will be moving off in our own direction, doing our own thing.

Overall, I just think that "programmer culture" is beginning to settle down on Acquire-release semantics. Its just a hunch... but more-and-more languages (C++, CUDA), and systems (ARM, POWER, NVidia GPUs, AMD GPUs) seem to be moving towards Acquire-release.

And in the next few years, we'll have cache-coherency over PCIe 4.0 or PCIe 5.0 in some form (CXL or other protocols on top of it). A unified memory model across CPU, DDR4 RAM, the PCIe-bus, and co-processors (GPUs, FPGAs, or Tensor cores), and high-speed storage (Optane and Flash SSDs over NVMe) is needed.

The community is just a few years out from having a unified memory model + coherent caches across the I/O fabric. Once this "defacto standard" is written, it will be very hard for it to change. That's why I think acquire-release is here to stay for the long term. Its the leading memory model right now.