Hacker News new | ask | show | jobs
by wolfgke 3308 days ago
> Which is slower on non-Intel and older CPUs. Add -march=native next time. Nets you a vector version.

Accepted (though the central advantage of SHLX (with-march=native) and SHL (no -march=native) for this example lies in the greater flexibility of register parameters). To my defense I only have a computer whose processor supports BMI2 for a few months now - so I could not play around with BMI2 before. Otherwise I am sure I would have known such tricks.

> See, rep is a workaround for older AMD CPU branch predictor. In addition gcc does not use the highly opcoded variant of lea with offsets because it is slow on older Intel.

I know that. My personal code philosophy is to avoid such hacks for circumventing performance bugs in outdated processors.

1 comments

By outdated you mean 5 year old? This is not for ancient Athlons. Those were 64 bit chips already.

Likewise preferring microcoded lea shafts everything older than Haswell on Intel side. Not to mention modern Atom.