|
|
|
|
|
by cogman10
182 days ago
|
|
Unfortunately, what you'll find if you dig into this is that cycles/op isn't as meaningful as you might imagine. Most modern CPUs are out of order executors. That means that while a floating point operation might take 4 cycles to complete, if you put a bunch of other instructions around it like adds, divides, and multiplies, those will all finish at roughly the same time. That makes it somewhat hard to reason about exactly how long any given set of operations will be. A FloatMul could take 4 cycles on it's own, and if you have FloatMul
ADD
MUL
DIV
That can also take 4 cycles to finish. It's simply not as simple as saying "Let's add up the cycles for these 4 ops to get the total cycle count".Realistically, what you'll actually be waiting on is cache and main memory. This fact is so reliable that it underpins SMT. It's why most modern CPUs will do that in some form. |
|
---
Counter-nitpick .. your statement of "if you put a bunch of other instructions around it" assumes there are no data dependencies between instructions.
In the example you gave:
Sure .. if all of those are operating on independent data sources, they could conceivably retire on the same cycle, but in the context of the article (approximating the performance profile of a series of operations) we're assuming they have data dependencies on one another, and are going to be executed serially.