|
|
|
|
|
by bri3d
1166 days ago
|
|
Hmm. This gets into the fuzzy definition of "loop in microcode" depending on how you look at the system. I don't think the actual looping happens in microcode, that is, it's not like the ucode unit jumps to earlier ucode - this wouldn't make sense architecturally for a variety of reasons. However, in the case of 64-bit integer division on mid-aged Intel processors (for example, Kaby Lake), I do think that division is both iterative and microcoded (versus fixed-function logic), but that the ucode emits an _unrolled_ loop into the scheduler. IDIV with 64-bit operands on Kaby Lake takes 56/57 uOps (!) vs the still-huge 11 uOps for 32-bit IDIV. (for comparison, we're down to 5/4 uOps for 64-bit division on Alder Lake). |
|
For example, Zen4 64-bit DIV is listed as: 2 uOps, 10-18 cycles latency, 7-12 cycles inverse throughput.
This suggests uOps with variable execution lengths, i.e. iteration happening in the execution unit and not just a fixed unrolled loop streamed by the microcode part of the frontend.
You may be right that there were some CPUs that did the fixed unrolling, but it doesn't seem that common.