| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Someone 3412 days ago
	When used to swap with data in memory, the reason is that it is faster. xchg is atomic. To do that, it implicitly locks its target address. That makes it slower than the series of moves. http://www.agner.org/optimize/instruction_tables.pdf: "Instructions with a LOCK prefix have a long latency that depends on cache organization and possibly RAM speed. If there are multiple processors or cores or direct memory access (DMA) devices then all locked instructions will lock a cache line for exclusive access, which may involve RAM access. A LOCK prefix typically costs more than a hundred clock cycles, even on single-processor systems. This also applies to the XCHG instruction with a memory operand."

1 comments

wolfgke 3412 days ago

I know that. But this is only relevant if you exchange registers with memory and is not of relevance if you exchange two registers. I accept that this is a good point if some variables are moved to the stack because of register spilling or because you want to use the address of the variable (which is not the case here).

So I still stand by my point: What is the reason why the compiler uses `mov` for exchanging two registers here instead of `xchg`?

link

Someone 3412 days ago

I think that's because (at least on some CPUs) it takes three macro-operations. http://www.agner.org/optimize/microarchitecture.pdf (section 17.4, page 188):

"Vector path instructions are less efficient than single or double instructions because they require exclusive access to the decoders and pipelines and do not always reorder optimally. For example:

    ; Example 17.1. AMD instruction breakdown
    xchg  eax, ebx   ; Vector path, 3 ops
    nop              ; Direct path, 1 op
    xchg  ecx, edx   ; Vector path, 3 ops
    nop              ; Direct path, 1 op

This sequence takes 4 clock cycles to decode because the vector path instructions must decode alone."

link

wolfgke 3412 days ago

This is indeed a good explanation - I admit I was not aware of this detail of the K8/K10 processors.

link