| Good points. I guess it's a debate of nomenclature, which lacks real importance (although it helps to reduce confusion). My point of view is mostly that, no, the x86 architecture certainly is not load-store, but internally modern x86 machines have execution pipelines that are built like regular load-store pipelines (i.e. the topology and flow is mostly the same as a RISC-style load-store pipeline). Or to put it another way, x86 execution pipelines are much closer to being register-register than being register-memory. > No, they designed and evolved the pipeline and its internal ISA to directly match the x86 ISA they needed to run. Yes. That is very true. Although the front end is the part of the pipeline that is most x86-specific, there are many parts of the rest of the pipeline that is tailored to be optimal for x86 code. It was obviously not designed in a vacuum. An interesting observation is that even other ISA:s and microarchitectures have been influenced by x86 (e.g. by including similar flags registers in the architectural state), in order to not suck at emulation of x86 code. |
Agreed. I wouldn't be pedantic about this for a higher level discussion, but when you start to get into the weeds of what a modern x86 pipeline is, it's starts to matter.
> (i.e. the topology and flow is mostly the same as a RISC-style load-store pipeline).
So that brings up an interesting question. What is a RISC-style pipeline?
Everyone mostly agrees what a RISC-style ISA is, usually focusing on the load/store arch and fixed length instructions (and exposed pipeline implementation details was another common attribute of early RISC ISAs, but everyone now agrees that was a bad idea).
But nobody ever seems to define "RISC-style pipeline", or they allude to overly broad definitions like "Everything that's not CISC" or "any pipeline for a RISC-style ISA". But you might have noticed that I like more precision in my nomenclature. And modern x86 CPUs prove the point that just because something is executing a CISC like ISA doesn't mean the pipeline is also CISC. There is nothing stopping someone implementing a CISC style uarch for a RISC style ISA (other than the fact it's a stupid idea).
I like to make a the bold argument the definition of "RISC-style pipeline" should be limited to simple in-order pipelines that are optimised for high-throughput that approaches one instruction per cycle (or group of instructions for super-scalar designs). The 5-stage MIPS pipeline is probably the most classic example of a RISC pipeline and the one usually taught in cpu architecture courses. But it ranges from 3 stage pipeline of Berkley RISC and early ARM chips to... well I think we have to include the PowerPC core from the PowerPC and Xbox 360 in this definition, and that has something like 50 stages.
(BTW, I also like to exclude VLIW arches from this definition of a RISC-style pipeline, sorry to everyone like Itanium and Transmeta)
Your 8 stage MRISC32 pipeline is a great sample of a modern RISC-style pipeline and along with all the in-order cores from ARM that are often around the same length.
But this does mean I'm arguing that anything with out-of-order execution is not RISC. Maybe you could argue that some simpler out-of-order schemes (like the early powerpc cores) aren't that far from RISC pipelines because they only have very small OoO windows. But by the time you get to the modern high-preformance paradigm of many instruction decoders (or uop caches) and absolutely massive re-order buffers, unified physical register file and many specialised execution units.
It's very much a different architecture to the original RISC-style pipelines, even if they still implement a RISC style ISA. It's an architecture that we don't have a name for, but I think we should. Intel pioneered this architecture starting with the P6 pipeline and perfecting it around the time of Sandybridge, and then everyone else seems to copying it for their high-performance cores.
I'm going to call this arch the "massively out-of-order" for now.
------------
Anyway, I wanted to switch to this way more precise and pedantic definition of "RISC-style pipeline" so that I could point out a key difference in the motivation in why Intel adopted this load-store aspect to their pipeline compared to RISC-style pipeline.
RISC-style pipelines are load-store partly because it results in a much more compact instruction encoding and allows fixed-length instruction, but mostly because it allows for a much simpler and more compact pipeline. The memory stage of the pipeline is typically after the execute stage. If a RISC-style pipeline needed to implement register-memory operations, they would need to move the execute stage after the memory stage completes, resulting in a significantly longer pipeline. And very problematic for the most pure RISC-style pipelines that don't have branch predictors and are relying on short their pipelines and branch-delay slots for branch performance.
A massively out-of-order x86 pipeline gets neither advantage. The instruction encoding is still register-memory (plus those insane read-modify-write instructions), and the cracking the instructions into multiple uops causes extra pipeline complexity. Also, they have good branch predictors.
The primary motivation for cracking those instructions is actually memory latency. Those pipelines want to be able to execute the memory uop as soon as possible so that if they miss in the L1 cache, the latency can be hidden.
This memory latency hiding advantage is also one of the major reasons why modern high-performance RISC cores moved away from the RISC-style pipeline to adopt the same massively out-of-order pipelines as x86. They just have a large advantage in the fact that their ISA is already load/store.