Hacker News new | ask | show | jobs
by throwaway-9320 2263 days ago
I remember reading somewhere that nowadays a significant chunk of the instructions isn't actually implemented on the CPU using transistors, but by using CPU microcode to sort of emulate these instructions by combining existing ones. Someone correct me if I'm wrong.
4 comments

Micro-ops are the actual things that can be executed by the hardware. A floating-point FMA unit is going to support a floating point addition, subtraction, fused multiply add (with various intermediate sign twiddles), and integer multiplication and wide multiplication--all without adding much more hardware: you're adding a few xors or muxes to the big, fat multiplier in the middle of it all. Each of these might have distinct micro-ops, or you might be able to separate the processing stages and use a single multiplier micro-op with distinct preprocessing micro-ops for the different instructions. Realistically, though, you are adding new micro-ops, although the overall hardware burden may be light.

The motivation of adding new instructions is generally to get higher performance, so there's going to be pressure to have hardware to execute it well, as opposed to a more naive emulation. But sometimes people add support without making it fast--AMD chips used to (still do? I'm not sure) implement the 256-bit AVX instructions by sending the 128-bit halves through their units in sequence, so that it technically supported AVX instructions but didn't see any improved benefit from it.

This is true. On the other hand, most of the transistors in the CPU are spent on memory (microcode and cache).
Back in the high CISC era every instruction would be backed by microcode as a series of instructions like "Load the first argument from memory location X; load the address of the second argument from memory location Y; now use that to get the second argument; store the result in memory location Z;"

Then in the RISC era the instructions being fed to processors more closely matched what was going on inside, though pipelining made that a bit more complicated.

These days a processor will still take the incoming instruction stream and sometimes break up instructions into pieces but it will also sometimes fuse two instructions into a single one like a compare followed by a branch.

That's not really the case. Many complex, obsolete or not timing critical instructions are microcoded, but the large majority of instructions executed by the cpu are not. They are translated to microops, but that's a different thing, normally there is a single microop that executes the bulk of the instruction.