| It's way more complex than this on modern CPUs, so it's harder to explain. > is microcode still used for division? Yes, almost everything on a modern CPU uses "microcode" of some kind, although the term gets kind of hazy, since everything is out-of-order and a lot of instructions are issued in parallel. In a typical modern CPU, the "frontend" will decompose instructions into uOps, which then get pushed into a "reservation station" / "scheduler." The scheduler queues and reorders various uOps in various surprising and complicated ways to try to account for interdependencies and memory latency. Eventually, a uOp is issued into to an "execution port," which is connected to a fixed-function piece of logic that actually performs part or all of an operation (for example, an Arithmetic Unit / ALU). But, while microcode will be _involved_ still, most modern CPUs will have fixed-function hardware for the meaty parts of the division instruction - they generally speaking won't implement division _purely_ using microcode like the algorithm documented in the article. > are there better and faster alternatives? They're not "alternatives" per se, but there are a _lot_ of ways to implement division algorithmically, and a _lot_ of ways to trade size for speed. Improving integer division performance has been a fairly big focus in newer CPU microarchitectures, with major improvements arriving in the latest Intel, AMD, and Apple Silicon architectures. https://uops.info/table.html will show how many uOps a given x86 instruction decomposes to, what ports it uses, and rough latency estimates for the instruction's execution. Here's some reading I found, with a lot of references: * A discussion of various modern division implementations: https://stackoverflow.com/questions/71420116/why-is-there-on... * Performance comparison of integer division in modern architecutures with some implementation speculation: https://news.ycombinator.com/item?id=27133804 * An in-depth look at the division unit in mid-old Intel CPUs (Penryn), look for "Radix-16" : https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d... |
µOps are different from the kind of microcode described here. Older x86 CPUs basically had a "bytecode interpreter" in microcode ROM, every instruction (except for some trivial set/clear flag operations) would go to a specific entry point, and even something simple like addition would take at least two µ-instrs.
The 80486 was the first generation that could decode some opcodes directly into one-cycle µOps.
edit
The term "interpreter" is of course a simplified description. The decoding itself is done outside of microcode, and there is logic to select different registers or ALU operations etc. But conceptually it's similar in that almost every opcode transfers control to some sequence of microinstructions ending in "RNI", which acts like a jump back to the main interpreter loop.
The 8086 is actually the closest to the "RISC-like microcode" meme, in that even address computation is done by a series of µ-instrs.