| tl;dr: In a multi-threaded context, memory reads and writes can be reordered by hardware. It gets more complicated with shared cache. Imagine that you have core 1 writing to some address at (nearly) the same time that core 2 reads from that. Does core 2 read the old or the new? Especially if they don't share the same cache -- core 1 might "write" to a given address, but it only gets written to core 1's cache and then "scheduled" to be written out to main memory. Meanwhile, later core 2 tries to read that address, it's not in its cache, so it pulls from main memory before core 1's cache has flushed. As far as core 2 is concerned, the write happened after it read from the address even though physically the write finished in core 1 before core 2's read instruction might have even started. A memory barrier tells the hardware to ensure that reads-before is also "happens-before" (or after) a given writen to the same address. It's often (but not always) a cache and memory synchronization across cores. I found Fedora Pikus's cppcon 2017 presentation [1] to be informative, and Michael Wong's 2015 presentation [0] filled in some of the gaps. C++, being a generic language for many hardware implementations, provides a lot more detailed concepts for memory ordering [2], which is important for hardwares that have more granularity in barrier types that what most people are used to with x86-derived memory models. [0]: https://www.youtube.com/watch?v=DS2m7T6NKZQ [1]: https://www.youtube.com/watch?v=ZQFzMfHIxng [2]: https://en.cppreference.com/w/cpp/atomic/memory_order.html |