Depending on how you define "properly written." It was easy to write seemingly proper code that assumed TSO and worked for x86 but does not work on architectures with weak memory consistency.
Code that assumes TSO is likely broken on x86 as well. The use of the volatile keyword doesn't really change that in any meaningful way either, given that part of the language is a bit under-specified. Basically, C compilers are free to do a lot of non-obvious optimizations which can reorder around volatile accesses.
Put another way, there isn't anything in the base C spec which can provide a guaranteed memory ordering barrier, which is why you absolutely have to depend on 3rd party specifications to get those guarantees. For example, if a program is using pthreads or openMP, their synchronization primitives must be used as well to assure portability.
That isn't to say that given a particular piece of code and compiler/switches/version the resulting program is wrong, just that its quite possible changing compilers/flags may result in "incorrect" code generation.
True, but outside of memory ordering, x86_64 and ARM64 are probably among the easiest to port between. Endianness, alignment and type sizes are the same, for example. Plus a lot of code already has been ported to both.