Hacker News new | ask | show | jobs
by inetknght 300 days ago
tl;dr:

In a multi-threaded context, memory reads and writes can be reordered by hardware. It gets more complicated with shared cache. Imagine that you have core 1 writing to some address at (nearly) the same time that core 2 reads from that. Does core 2 read the old or the new? Especially if they don't share the same cache -- core 1 might "write" to a given address, but it only gets written to core 1's cache and then "scheduled" to be written out to main memory. Meanwhile, later core 2 tries to read that address, it's not in its cache, so it pulls from main memory before core 1's cache has flushed. As far as core 2 is concerned, the write happened after it read from the address even though physically the write finished in core 1 before core 2's read instruction might have even started.

A memory barrier tells the hardware to ensure that reads-before is also "happens-before" (or after) a given writen to the same address. It's often (but not always) a cache and memory synchronization across cores.

I found Fedora Pikus's cppcon 2017 presentation [1] to be informative, and Michael Wong's 2015 presentation [0] filled in some of the gaps.

C++, being a generic language for many hardware implementations, provides a lot more detailed concepts for memory ordering [2], which is important for hardwares that have more granularity in barrier types that what most people are used to with x86-derived memory models.

[0]: https://www.youtube.com/watch?v=DS2m7T6NKZQ

[1]: https://www.youtube.com/watch?v=ZQFzMfHIxng

[2]: https://en.cppreference.com/w/cpp/atomic/memory_order.html

1 comments

Than you.