The copying from the 80-bit register to the 64-bit memory cell is causing the result to be corrupted. When we use volatile it tells the compiler not to apply the optimisation which in this case was putting it in a register.
But doesn't that make it a compiler bug? If the root cause is that the compiler truncates data it shouldn't, does that mean the fundamental issue is a compiler bug?