| HN Mirror

There is more nuance to it than this, but basically in x86 all memory writes are available to all cores via main memory, whereas with aarch64 they do not.

On x86, a write by core A to memory will be available to core B if core B reads from main memory.

On aarch64, a write by core A will not immediately get published to main memory (will likely stay in cache (L1, L2, etc.), so even if core B tries to read from main memory it won't see the value from core A.

Ultimately aarch64's "weak"(er) memory model is more efficient as the programmer/compiler can make more efficient memory accesses. This results in fewer cache invalidations between cores. The problem in practice is that tons of production code has been written which assumes the x86 memory model. It may also just be a concurrency bug which doesn't manifest on x86 but does on aarch64 like in the post.

Again, this is a simplification of what happens but I think it illustrates the difference to some degree.