It begs a question though: How many instructions in <insert ISA here> are equivalent? I assume that a compiler writer has a list of equivalents and will typically choose the shorter one?
The only issue is that for RISC, all these instructions are of equal length, so flipping them around would gain you very little, or more likely zero effect unless you are chasing some corner case thing like "XOR instruction value compresses slightly better than ADD because.."
There are a number of considerations there. Size is only one of them. Speed and internal processor state effects are two others. For instance, a larger, slower instruction might prevent a pipeline stall in a particular function or might enable loop unrolling or might allow a shorter loop unrolling, while in a similar function that doesn’t pipeline the same way, the compiler will choose a faster instruction.
Well that 0 that you are loading comes from the instruction, so it is already "there". It boils down to the fact that the instruction is sorter.
In fact in theory the load is slower, because XOR has data dependencies on the arguments. So an out-of-order processor could be delayed. However x86 has special logic that XOR with itself doesn't carry any dependencies on the arguments.
In addition to other concerns, processors usually treat xor specially since it was the best way to zero things for so long that it became ubiquitous. Often its performance impacts are equivalent to a noop.
That 0 has to come from somewhere, while in the other case XORing a register with itself does not involve loading any data. It's also shorter.