To what extent is this simply working around the weirdness of x86? Do these improvements apply to something like MIPS, ARM64, or RISC-V that have inherently simpler ISAs?
In this particular case they were universal but in paper it's said the optimizations were done on x86. One of the ideas was to use LLVM IR but intuition for optimizer over optimizer was unlikely to work properly.
My guess: Using LLVM IR would mean that the LLVM optimiser might have made the results more noisy or hard to understand when it was compiled to actually execute.