Hacker News new | ask | show | jobs
by celeritascelery 1611 days ago
> predicted, not-taken branches tend to have 0 cycle latency in recent CPUs.

This is not the case. Due to instruction level parallelism, the throughput could be unaffected, but you will always have latency penalty. The CPU still needs to run the check (access the length and compare it to the index) and this adds latency. On top of that, it also increases code size, which can impact the instruction cache and binary size. It’s a small penalty, but it’s not 0.

1 comments

Speculative execution enables continuing along the predicted branch without stopping. You do need to have the ~2 instructions to get the length test input on hand but that usually can be eaten by insn level parallelism without hurting the latency of the array operation.
Do you have any evidence of this claim? Perhaps a benchmark?

This doesn't align with any of my performance optimization experience.

I went looking, and seems I have to walk my claim back somewhat. Wide issue OoO processors hide a lot of the overhead but not all of it.