|
|
|
|
|
by atq2119
1169 days ago
|
|
While division may still decode to multiple uOps, I seriously doubt that there's a loop in microcode on modern processors. The pipeline latency makes that infeasible. The looping logic is almost certainly a bit of fixed function hardware in the execution unit. |
|
However, in the case of 64-bit integer division on mid-aged Intel processors (for example, Kaby Lake), I do think that division is both iterative and microcoded (versus fixed-function logic), but that the ucode emits an _unrolled_ loop into the scheduler.
IDIV with 64-bit operands on Kaby Lake takes 56/57 uOps (!) vs the still-huge 11 uOps for 32-bit IDIV. (for comparison, we're down to 5/4 uOps for 64-bit division on Alder Lake).