The article explicitly mentions two times things that are only true for x86 (grep for it). In addition, the statement at the end is definitely not true for POWER: "As soon as the data is read/written to the L1 cache, the hardware-coherency protocol takes over and provides guaranteed coherency across all global threads. Thus ensuring that if multiple threads are reading/writing to the same variable, they are all kept in sync with one another."
Those two things are not x86 specific (the author only gives x86 an example). And the statement you quote is certainly true for POWER or any other cache coherent architecture.