Hacker News new | ask | show | jobs
by jhoechtl 59 days ago
Back in the stone ages XOR ing was just 1 byte of opcode. Habbits stick. In effect XORing is no longer faster since a long time.
3 comments

The XOR trick is implemented as a (malloc from register file) on modern processors, implemented in the decoder and it won't even issue a uOp to the execution pipelines.

Its basically free today. Of course, mov RAX, 0 is also free and does the same thing. But CPUs have limited decoder lengths per clock tick, so the more instructions you fit in a given size, the more parallel a modern CPU can potentially execute.

So.... definitely still use XOR trick today. But really, let the compiler handle it. Its pretty good at keeping track of these things in practice.

-----------

I'm not sure if "sub" is hard-coded to be recognized in the decoder as a zero'd out allocation from the register file. There's only certain instructions that have been guaranteed to do this by Intel/AMD.

sub is also recognized as zeroing idiom for register file. Intel documents these in "3.5.1.7 Clearing Registers and Dependency Breaking Idioms" from Optimization Reference Manual: https://www.intel.com/content/www/us/en/developer/articles/t...

Here's html version: https://zzqcn.github.io/perf/intel_opt_manual/3.html#clearin...

AMD has similar list in "2.9.2 Idioms for Dependency removal" from "Software Optimization Guide for the AMD Zen5 Microarchitecture" document: https://docs.amd.com/v/u/en-US/58455_1.00

Depending on what's stone-age for you, a SUB with a register was also only one byte, and was the same cost as XOR, at least in the Intel/Zilog lineage all the way back to the 70s ;)
The article’s point is about why XOR is preferred over SUB, both being one byte.

MOV is right out.