| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hrydgard 1571 days ago
	This is old, cute, but simply slow and a really bad idea on modern hardware.

2 comments

mi_lk 1571 days ago

> slow and a really bad idea on modern hardware.

say more?

link

bruce343434 1571 days ago

care to elaborate? It's just 3 instructions.

link

jleahy 1571 days ago

On a modern x86 cpu the ‘xchg’ instruction performs a swap and can do so entirely in the front-end via register renaming. It doesn’t even require a micro-op. This ‘trick’ actually requires executing micro-ops and creates an unnecessary data dependency (which is actually worse than the micro-ops themselves).

Better to just use more variables and let the register allocator in the compiler decide what to do. If it’s a loop then unrolling it once could remove the need for any swapping at all, for example.

link

zwegner 1571 days ago

> On a modern x86 cpu the ‘xchg’ instruction performs a swap and can do so entirely in the front-end via register renaming. It doesn’t even require a micro-op.

This is only true for AMD cpus, on Intel xchg is 3 uops. Still better than the xor trick, though.

Source: https://www.uops.info/html-instr/XCHG_R64_R64.html

link

marginalia_nu 1571 days ago

What if you want to swap two large memory blocks? Should be doable in a maximally cache-fridendly way with SIMD XORs I think.

link

tomn 1571 days ago

this isn't going to be any better than just loading from the two buffers into registers then storing the other way around, like:

  a = load(ap)
  b = load(bp)
  store(ap, b)
  store(bp, a)
  ap += step; bp += step;

any instructions to "do the swap" are a waste because generally load-store are separate instructions in SIMD instruction sets (and even if that wasn't the case, that's how they would get executed anyway)

if you want to avoid polluting the cache there are SSE instructions for loading without caching, which might be worthwhile

edit: this might be useful in a SIMD context where you need to swap two registers, where the cost of using another register is higher than the cost of the 3 arithmetic instructions. i could totally imagine that happening, but it's nothing to do with caches or memory

link

chmod775 1571 days ago

> What if you want to swap two large memory blocks?

In theory, maybe.

But if that happens in your application and is performance critical, you probably should change it such that you're swapping pointers to them instead...

link

marginalia_nu 1571 days ago

There are very real cases where you may want to swap around actual memory though. Gabage collectors do quite a lot of this type of large memory exhanges.

link

animal531 1571 days ago

I'd also hazard that throwing in an if statement in place of a temporary variable is a bad idea.

link