Hacker News new | ask | show | jobs
by dragontamer 60 days ago
The XOR trick is implemented as a (malloc from register file) on modern processors, implemented in the decoder and it won't even issue a uOp to the execution pipelines.

Its basically free today. Of course, mov RAX, 0 is also free and does the same thing. But CPUs have limited decoder lengths per clock tick, so the more instructions you fit in a given size, the more parallel a modern CPU can potentially execute.

So.... definitely still use XOR trick today. But really, let the compiler handle it. Its pretty good at keeping track of these things in practice.

-----------

I'm not sure if "sub" is hard-coded to be recognized in the decoder as a zero'd out allocation from the register file. There's only certain instructions that have been guaranteed to do this by Intel/AMD.

1 comments

sub is also recognized as zeroing idiom for register file. Intel documents these in "3.5.1.7 Clearing Registers and Dependency Breaking Idioms" from Optimization Reference Manual: https://www.intel.com/content/www/us/en/developer/articles/t...

Here's html version: https://zzqcn.github.io/perf/intel_opt_manual/3.html#clearin...

AMD has similar list in "2.9.2 Idioms for Dependency removal" from "Software Optimization Guide for the AMD Zen5 Microarchitecture" document: https://docs.amd.com/v/u/en-US/58455_1.00