|
|
|
|
|
by BeeOnRope
528 days ago
|
|
We are interested in the software visible performance effects of pipelining. For small benchmarks that don't miss in the predictors or icache, this mostly means execution pipelining. That's the type of pipelining the article is discussing and the type of pipelining considered in instruction performance breakdowns considered by Agner, uops.info, simulated by LLVM-MCA, etc. I.e., a lot of what you need to model for tight loops only depends on the execution latencies (as little as 1 cycle), and not on the full pipeline end-to-end latency (almost always more than 10 cycles on big OoO, maybe more than 20). |
|
Those are different notions of pipelining with different motivations: one is motivated by "instruction-level parallelism," and the other is motivated by "achieving higher clock rates." If 64-bit multiplication were not pipelined, the minimum achievable clock period would be constrained by "how long it takes for bits to propagate through your multiplier."