Hacker News new | ask | show | jobs
by Azerb 2520 days ago
The irony here is most modern CISC design are breaking instructions to RISC-like μOps. Moore's law also means - you have more transistors for the same area, now figure out how to use them creatively to increase performance. Workloads are constantly evolving and hardware evolves with it to make those workloads fast.
3 comments

> The irony here is most modern CISC design are breaking instructions to RISC-like μOps.

... as well as combining two (or even more?) instructions into one μOp.

https://en.wikichip.org/wiki/macro-operation_fusion

CISC since around the turn of the millennium is basically a custom tuned high decode speed data compression codec for RISC-like micro-ops. It's been a very long time since anyone designed a CISC processor that actually ran (non-trivial) CISC instructions directly in silicon.

The root of CISC's persistent dominance over true RISC instruction sets is that memory bandwidth is far lower what would be needed to feed micro-ops directly into the CPU. It makes sense to solve that by compressing the instruction stream. RISC looks far better on paper in every other way if you ignore memory bandwidth and latency issues.

That being said, I've wondered for many years about whether a more conscious realization of this might lead to a more interesting design. Maybe instead of CISC we could have CRISC, Compressed Reduced Instruction Set Computer? Instead of CISC you'd have some kind of compression codec that defines macros dynamically. I'm sure X64 and ARM64+cruft are nowhere near optimal compression codecs for the underlying micro-op stream. If someone wants to steal that idea and run with it, be my guest.

The other advantage of the CISC is that it acts like a higher-level API. Many early RISC designs suffered because they were so low-level that early implementation details (like wait states) had to be "emulated" in later processors for compatibility.

It might not be advantageous to just compress a RISC stream of instructions instead of higher level instructions made up of micro-ops for that reason alone.

Dynamically swapping compression like that probably isn't worth it, as now the decode tables are extra state that needs to be compared against, all while inside a critical path.

But most modern RISCs do take a sort of Huffman encoding perspective on ISA design, starting with SH, into Thumb(2), and into RVn-C. I do agree that there's farther we can probably go; stuff like memory referencing ALU ops can be thought of as a way of addressing PRF registers without using any bits in the instruction stream for instance.

U mean like thumb2?
Although the micro-operations are dispatched to many parallel execution units, so it's really better described as VLIW.
No, that's just superscalar. The defining feature of VLIW is that the the compiler schedules the dispatch.
The ports/schedulers have quite different capabilities however:

https://en.wikichip.org/wiki/intel/microarchitectures/skylak...

Using the Intel Vtune tools you can see how each port is utilized, so you could in theory change your code to mix instructions for best utilization beyond what reordering the CPU can do itself, so I can see some analogy with building a VLIW instruction group.

There's a crazy amount of performance counters you can look at (the perf tool can do that too, but just try running "perf list" to view available counters).