|
|
|
|
|
by nitrogen
2850 days ago
|
|
Is there a good resource that explains the difference between those ("decode" vs. "trapping") on a modern CPU? When I see "trap" I imagine the kernel catching illegal instruction exceptions and emulating them in software, but it doesn't seem like that's what you mean? |
|
On one CPU model a x86 operation like "256bit add" might translate into "256bit add" micro-op, and on another model the same x86 operation might be translated into a series of micro-ops like "128bit add, wait a cycle for the 1st add to finish, pass the carry bit into a 2nd 128 bit add", because that model doesn't have a real 256bit adder. So the latency of the operation is 2 cycles, but nothing else is changed.
Some x86 instructions might be very complicated and cannot be translated into a fixed-length series of micro-ops using a template. For example, the integer division, square root or the string compare machine instructions might be loops with conditionals in them and don't run the same amount of micro-ops every time. They can be implemented by Intel using a program written in micro-ops. Intel stores this program in flash on the CPU and the decoder knows to run that program when encountering the instruction. The OS doesn't need to help here, this is not emulation or software-floating point, it's just that the single instruction takes 200 clock cycles. What this does to the out-of-order engine is another story. These "programs", called microcode, can have bugs and newer versions of microcode updates, sent to the CPU at boot by the BIOS/UEFI and/or by the OS, update them.
https://en.wikichip.org/wiki/macro-operation
https://en.wikichip.org/wiki/micro-operation
https://en.wikipedia.org/wiki/Microcode