| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by phire 881 days ago

> Mitch Alsup calls it GBOoO (Great Big Out-of-Order).

I like that term. Do you have any suggested reading material from Alsup?

------------

> I see what you're pointing at. I don't think that we'll fully agree on the nomenclature,

Ok, I admit I might be going a little far by trying to redefine anything that isn't a classic in-order RISC pipeline as "not RISC" (even when they have a RISC style ISA). And as an amateur CPU architecture historian, I'm massively underqualified to be trying to redefine things.

I'm also not a fan of the fact that my argument defines any pipeline with any amount of OoO as "not RISC". Because I do know the early PowerPC pipeline quite well (especially the 750 pipeline), and the amount of out-of-order is very limited.

There is no reorder buffer. There is no schedule, instead dispatch only considers the next two instructions in the instruction queue, and there is only one reservation station per execution pipeline. For the 601, there are only three pipelines (Integer, FPU and Special) and branches are handled before dispatch. So while a branch or FPU instruction might be executed before an Integer instruction, you can't have two instructions for the same pipeline execute out of order.

I don't think the 601 even has renaming registers, there is no need as Integer instructions, Floating instructions AND branch instructions all operate on different register sets (and I'm just realising exactly why PowerPC has those seperate condition registers)

Now that I think about it, the 601 pipeline might be described as a superscalar in-order RISC pipeline that simply relaxes the restriction on the different execution pipes starting out of program order.

Maybe I should be altering my argument so to allow simpler out-of-order schemes to still be considered RISC. The 601 is certainly not something us people from the future would recognise as OoO except by the strictest definition of somethings instructions execute out-of-order.

The later PowerPC designed do muddy the water; The 604 (1996) introduces the concept of multiple integer pipelines that can execute the same instructions. They only have one reservation station each, but this will allow instructions of the same type to be executed out of order via different pipeline. The load/store instructions were moved to their own pipeline, in the later 750 design (aka the G3, 1997), the load store pipeline gained two reservation stations, allowing memory instructions to be executed out of order down the same pipeline.

It's not until the PowerPC 7450 (aka the G5, 2001) that the PowerPC finally gained something approaching a proper scheduler, removing the one reservation station per pipeline bottleneck.

> E.g. the POWER1 (1990) was the first (non-mainframe) out-of-order CPU with register renaming, and it was a RISC.

As I understand, the POWER1 is about the same as the PowerPC 601. There is no register renaming, the only out-of-order execution is the fact that branch instructions execute early, and floating point instructions can execute out of order with respect to integer instructions.

I don't think there is a RISC cpu with register renaming until the PowerPC 604 in 1996 or maybe PowerPC 750 in 1997, and that was very limited, only a few renaming registers.

---------------

> but this kind of feels like the RISC vs CISC debate all over again

Yes. And my viewpoint originates from my preferred answer to the RISC vs CISC debate. That they are outdated terms that belong to the high-performance designs of 80s and early 90s, and don't have any relevance to modern GBOoO designs (though RISC does continue to be relevant for lower-power and low area designs)

> I guess one of my main motivations is to make people understand that x86 is no longer CISC "under the hood"

We both agree that GBOoO designs aren't CISC. I'm just taking it a step further in saying they aren't RISC either.

But my preferred stance leads to so many questions. If such designs aren't RISC then what are they? Where should the line between RISC and not-RISC be drawn? If we are allowing more than just two categories, then how many more do we need?

It's certainly tempting to adopt your "everything is either CISC or RISC" stance just to avoid those complicated questions, but instead I try to describe lines.

And I think you agree with me that having accepted definitions for groupings of related microarchitectures would be useful, even if you want them to be sub-categories under RISC.

> BTW, w.r.t. nomenclature, I make a clear distinction between "architecture" and "microarchitecture" (even if I mix up contexts at times).

Yeah, I try to avoid "architecture" all together, though I often slip up. I use ISA for the instruction set and microarchitecture or uarch for the hardware implementation.

----

> However, the key takeaway that has stood the test of time is an instruction set that enables fully pipelined execution...

So I agree with all this. I think what I'm trying to do (this conversation is very helpful for thinking though things) is add an additional restriction that RISC is also about trying to optimise that pipeline to be as short as possible.

Pipeline length is very much the enemy for in-order pipelines. The longer the back-end have, the more likely you are to have data hazards. And a data hazard is really just a multi-cycle instruction in disguise. This is a major part of the reason why RISC always pairs with load/store. Also the more stages you have in the front-end, the larger your branch misprediction delay (and in-order pipelines are often paired with weak branch predictors, if they have one at all).

But the switch to the GBOoO style architecture has a massive impact on this paradigm. Suddenly, pipeline length stops being so critical. You still don't want to go crazy, but now your scheduler finds different instructions to re-order into the gaps that would have been data hazard stalls in an in-order design. And part of the price you pay for GBOoO is a more complex frontend (even a RISC ISA requires extra complexity for OoO over In-order), but you are happy to pay that cost because of the benefits, and the complex branch predictors help mitigate the downsides.

(I don't know where Alsup wants to draw the line for GBOoO, but I'm taking OoO designs with proper schedulers and ROBs with dozens of entires. Designs like the early PowerPC chips with their limited OoO don't count, they were still very much optimising for short pipeline lengths)

I'm arguing that this large paradigm shift in design is enough justification to draw a line and limit RISC to just the classic RISC style pipelines.

> the NexGen Nx686 (1995, later AMD K6), was also out-of-order, and was said to have a RISC microarchitecture (based on RISC86).

I don't like relying on how engineers described or how the marketing team branded their CPU design for deciding if a given microarchitecture is RISC or not. RISC was more of a buzzword than anything else, and the definition was muddy.

A major point against the NexGen line being RISC (including all AMD designs from the K6 to Bulldozer etc) is that they don't crack register-memory instructions into independent uops. I pointed this previously, but their integer pipelines can do a full read/modify/write operation with a single uop. I don't know about you, but I'm pretty attached to the idea that RISC must be load/store.

This is also part of the reason why I want more terms than just RISC and CISC. Because the NexGen line is clearly not CISC either.

And we also have to consider the other x86 designs from the 486 and 586 era. They are fully pipelined and even superscalar, but they don't crack up register-memory ops, and their pipelines haven't been optimised for length, so it would be wrong to label them as RISC or RISC-like.

But they are so far from the "state-machine style microarchitectures" (and I think that's a perfectly fine term) that CISC originated from that I think it's very disingenuous to label them as CISC or CISC-like either.

> For most intents and purposes most GBOoO microarchitectures are comparable when it comes to the execution pipeline, regardless of which ISA they are using. The main differences are in the front end - but even there many of the principles are the same

The execution pipelines themselves might be very comparable, but you are forgetting the scheduling, which adds massively to backend complexity, and makes a major impact to the overall microarchitecture and the design paradigms.

1 comments

mbitsnbites 881 days ago

> I like that term. Do you have any suggested reading material from Alsup?

He dwells in the comp.arch newsgroup, and is usually happy to answer questions.

His take on RISC: https://groups.google.com/g/comp.arch/c/UvprSM9xJfM

His ISA, My 66000: https://groups.google.com/g/comp.arch/c/SlbYDIPZjH0/m/CLkxJH...

Since he designed the Motorola 88000 ISA, I assume that he has a finger in the Mc88100 users manual: http://www.bitsavers.org/components/motorola/88000/MC88100_R...

He also has quite a few interesting patents: https://patents.justia.com/inventor/mitchell-alsup

> As I understand, the POWER1 is about the same as the PowerPC 601. There is no register renaming

Yes, it's a stretch to call it an OoO implementation with register renaming. I think that it does register renaming, but only for the FPU, and only in a very conservative way. Merely pointing out that OoO and register renaming weren't really pioneered in x86 architectures in the late 1990's. As usual, it's an incremental process, with no clear start.

> and the definition was muddy

And it still is ;-)

> This is also part of the reason why I want more terms than just RISC and CISC. Because the NexGen line is clearly not CISC either.

Agree. I think that many of the confusions are around ISA and microarchitecture. E.g. do all x86 CPU:s count as CISC, since the ISA is CISC? Does the internal uOP instruction set qualify as an "architecture" (ISA)? How about the decoded "operation packets" that flow down the execution pipeline? Can you even say that a microarchitecture is "RISC", or is that term reserved for the architecture?

I think I lack a good term for "a straight pipeline without loops" (or thereabout), which is kind of the original watershed between CISC and RISC, back when architecture and microarchitecture were still very intimately coupled (remember, CISC too exposed to much of the microarchitectural details, which is one of the main reasons that we have an "x86 tax" in the first place).

z/Architecture is an interesting extreme to bring into the mix, since even though it does pipelining, most of the implementations have a fairly complex "graph" rather than a straight line (AFAICT). It can't be made to resemble RISC even if you squint, whereas it can be hard to tell the pipelines for AArch64, POWER9 and x86-64 implementations apart at a quick glance: They all have a very clear and straight flow.

> this conversation is very helpful for thinking though things

Exactly :-) I love these discussions as I find that when I try to explain or argue a certain topic, many pieces fall into place, plus of course you pick up and learn tons from people like you who know a lot and clearly have given these questions some thought.

> but you are forgetting the scheduling, which adds massively to backend complexity, and makes a major impact to the overall microarchitecture and the design paradigms

I think I'm merely dismissing it as "yet another major technical improvement" (right up there with pipelining, caches, branch prediction, and superscalar execution). It surely is a major microarchitecture paradigm, and it does seem to be one that will stick for a long time, but I'm still reluctant to compare it to CISC or RISC, which in my mind talk more about the architectecture, whereas GBOoO talks more about the microarchitecture, and is largely ISA agnostic (these days). Gaah, it's hard. Going back to the question about ISA vs uarch again.

Anyway, this has been a very good talk. Thanks!

link

phire 881 days ago

> Merely pointing out that OoO and register renaming weren't really pioneered in x86 architectures in the late 1990's. As usual, it's an incremental process, with no clear start.

Yeah, the incremental nature makes it hard to try and classify things.

What I want to say is that x86 was the first to combine out-of-order execution, register renaming, a complex unified scheduler, and a large enough ROB to get the advantage of memory latency hiding.

Though I don't even know if that is true, perhaps there is some obscure mainframe CPU that got there first. Or perhaps the entry 40 uop ROB of the Pentium Pro isn't actually large enough to get that memory latency hiding advantage, and the some OoO RISC processor actually got there first.

> I think that many of the confusions are around ISA and microarchitecture... Can you even say that a microarchitecture is "RISC", or is that term reserved for the architecture?

Yeah. It would make sense to argue that a microarchitecture can only be RISC if it was designed in parallel with its ISA.

And it's worth noting the RISC philosophy started before microprocessor engineers even started reusing ISAs across multiple generations of microarchitectures. Binary backwards compatibility did happen in the mainframe and minicomputer world, but I can't think of any examples that were released before the Berkeley and Stanford RISC projects started in 1979 and 1981 (the 286 is the first example I can think of, released in 1982).

So that was the era where every new microprocessor was a new ISA. I don't think people started talking about microarchitectures until much later.

> I think I lack a good term for "a straight pipeline without loops" (or thereabout), which is kind of the original watershed between CISC and RISC,

I label such designs as "fully-pipelined". Though I'm not too strict, as long as most of the most common instructions are fully pipelined. Because there are plenty of RISC designs which aren't fully pipelined, with multi cycle divide (or sometimes even multiply) instructions.

> but I'm still reluctant to compare it to CISC or RISC, which in my mind talk more about the architectecture, whereas GBOoO talks more about the microarchitecture, and is largely ISA agnostic (these days).

Which is probably how we all ended up in this mess in the first place. CISC and RISC both required that the ISA and microarchitecture be designed in parallel, complementing each other to get the best implementation.

One of GBOoO's major advantages is that it's ISA agnostic, and that's the exact reason why the x86 designers gravitated towards it, a they had the restriction of their legacy CISC ISA that was having problems competing with RISC designs.

But because GBOoO is ISA agnostic, nobody ever designed an ISA for it (at least not until AArch64, but I suspect that was only partially designed for GBOoO). And because there is no ISA, the bulk of the programming and tech community doesn't hear about it in the same way they hear about RISC and CISC. I mean, we don't even have a commonly accepted name for it other than "out-of-order"

link

mbitsnbites 881 days ago

BTW, do you have any blog, public repos or anything else?

link

phire 881 days ago

I really should do a blog. I have plenty of good comments and discussions (like this one) spread randomly throughout hacker news and reddit, that could do with being fleshed out and published.

And recently I've been pondering the idea of making youtube videos.

I do have a github, but it's not really focused for public consumption, just a bunch of half-finished projects. My current project is experimentation towards high-preformance cycle-accurate emulators: https://github.com/phire/bus-mu

I'm currently working towards the Nintendo64 (the homebrew community could really do with a fully accurate emulator, as so much of the performance is dictated by bus contention) but I hope the approach will be performant enough to implement cycle accurate versions of later consoles like the xbox (Intel P6 pipeline) and gamecube (PowerPC 750 pipeline).

link