|
|
|
|
|
by celeritascelery
1611 days ago
|
|
> predicted, not-taken branches tend to have 0 cycle latency in recent CPUs. This is not the case. Due to instruction level parallelism, the throughput could be unaffected, but you will always have latency penalty. The CPU still needs to run the check (access the length and compare it to the index) and this adds latency. On top of that, it also increases code size, which can impact the instruction cache and binary size. It’s a small penalty, but it’s not 0. |
|