| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by celeritascelery 1611 days ago
	> predicted, not-taken branches tend to have 0 cycle latency in recent CPUs. This is not the case. Due to instruction level parallelism, the throughput could be unaffected, but you will always have latency penalty. The CPU still needs to run the check (access the length and compare it to the index) and this adds latency. On top of that, it also increases code size, which can impact the instruction cache and binary size. It’s a small penalty, but it’s not 0.

1 comments

fulafel 1610 days ago

Speculative execution enables continuing along the predicted branch without stopping. You do need to have the ~2 instructions to get the length test input on hand but that usually can be eaten by insn level parallelism without hurting the latency of the array operation.

link

errantmind 1610 days ago

Do you have any evidence of this claim? Perhaps a benchmark?

This doesn't align with any of my performance optimization experience.

link

fulafel 1600 days ago

I went looking, and seems I have to walk my claim back somewhat. Wide issue OoO processors hide a lot of the overhead but not all of it.

link