Hacker News new | ask | show | jobs
by thebolt00 485 days ago
It actually depends on what flags you pass to clang, and for a good reason. 3 term lea uses "complex decoding" and thus has higher latency (and less possible execution ports) on intel arches before icelake. If you run clang -O2 -mtune=icelake-client or -mtune=znver3 (or later architectures) it will generate the single lea instruction.

As always in optimization choices it comes down to cost modelling and trade-offs.

1 comments

Interesting. I guess GCC’s cost modeling is different, then? Or does it default to a newer machine?