| HN Mirror

> Which is slower on non-Intel and older CPUs. Add -march=native next time. Nets you a vector version.

Accepted (though the central advantage of SHLX (with-march=native) and SHL (no -march=native) for this example lies in the greater flexibility of register parameters). To my defense I only have a computer whose processor supports BMI2 for a few months now - so I could not play around with BMI2 before. Otherwise I am sure I would have known such tricks.

> See, rep is a workaround for older AMD CPU branch predictor. In addition gcc does not use the highly opcoded variant of lea with offsets because it is slow on older Intel.

I know that. My personal code philosophy is to avoid such hacks for circumventing performance bugs in outdated processors.