Hacker News new | ask | show | jobs
by wolfgke 3412 days ago
I know that. But this is only relevant if you exchange registers with memory and is not of relevance if you exchange two registers. I accept that this is a good point if some variables are moved to the stack because of register spilling or because you want to use the address of the variable (which is not the case here).

So I still stand by my point: What is the reason why the compiler uses `mov` for exchanging two registers here instead of `xchg`?

1 comments

I think that's because (at least on some CPUs) it takes three macro-operations. http://www.agner.org/optimize/microarchitecture.pdf (section 17.4, page 188):

"Vector path instructions are less efficient than single or double instructions because they require exclusive access to the decoders and pipelines and do not always reorder optimally. For example:

    ; Example 17.1. AMD instruction breakdown
    xchg  eax, ebx   ; Vector path, 3 ops
    nop              ; Direct path, 1 op
    xchg  ecx, edx   ; Vector path, 3 ops
    nop              ; Direct path, 1 op
This sequence takes 4 clock cycles to decode because the vector path instructions must decode alone."
This is indeed a good explanation - I admit I was not aware of this detail of the K8/K10 processors.