|
|
|
|
|
by wolfgke
3308 days ago
|
|
> Which is slower on non-Intel and older CPUs. Add -march=native next time. Nets you a vector version. Accepted (though the central advantage of SHLX (with-march=native) and SHL (no -march=native) for this example lies in the greater flexibility of register parameters). To my defense I only have a computer whose processor supports BMI2 for a few months now - so I could not play around with BMI2 before. Otherwise I am sure I would have known such tricks. > See, rep is a workaround for older AMD CPU branch predictor. In addition gcc does not use the highly opcoded variant of lea with offsets because it is slow on older Intel. I know that. My personal code philosophy is to avoid such hacks for circumventing performance bugs in outdated processors. |
|
Likewise preferring microcoded lea shafts everything older than Haswell on Intel side. Not to mention modern Atom.