Hacker News new | ask | show | jobs
by titzer 2023 days ago
For 1), just think of instructions of little bundles of bytes. The CPU runs through the instructions in forward order, jumping around to other bits of the code as it goes. X86 has variable-width instructions (i.e from 1 byte up to 17 bytes--X86 is complex and there are a lot of prefix bytes that have been used to add new functionality over the years). To determine how long an instruction is, you need to decode the bits of the instruction. For ARM64, and most other ISAs nowadays, the instructions are all 4 bytes long. That means they can all be decoded in parallel.

For 2, imagine a boa-constrictor swallowing a huge piece of prey. One mouth (CPU: the frontend) and one rear (CPU: the retirement phase). The instructions go in the front end in the program order. They are decoded into operations that pile up in the middle (the giant bulge in the boa constrictor). When an instruction is ready to go, one of the execution ports (3--think of 16 little stomachs) picks up an instruction and executes it. Then at the backend, the retirement phase, instructions are committed in the order they appeared in the original program, so that the program computes the same result.

By making basically all of the pieces of this boa constrictor bigger and more numerous, it eats a lot more instructions per clock (on average). Making that bulge (the reorder buffer) huge allows the CPU to have high chance of some useful work to feed to one of its 16 stomachs.