Hacker News new | ask | show | jobs
by weebull 107 days ago
All of those things are solved with modern extensions. It's like comparing pre-MMX x86 code with modern x86. Misaligned loads and stores are Zicclsm, bit manipulation is Zb[abcs], atomic memory operations are made mandatory in Ziccamoa.

All of these extensions are mandatory in the RVA22 and RVA23 profiles and so will be implemented on any up to date RISC-V core. It's definitely worth setting your compiler target appropriately before making comparisons.

5 comments

Ubuntu being RVA23 is looking smarter and smarter.

The RISC-V ecosystem being handicapped by backwards compatibility does not make sense at this point.

Every new RISC-V board is going to be RVA23 capable. Now is the time to draw a line in the sand.

I’d be kind of depressed if every new RISC-V board was not RVA23 capable.
But RISC-V is a _new_ ISA. Why did we start out with the wrong design that now needs a bunch of extensions? RISC-V should have taken the learnings from x86 and ARM but instead they seem to be committing the same mistakes.
I was a bit shocked by headline, given how poorly ARM and x86 compares to RISC-V in speed, cost, and efficiency ... in the MCU space where I near-exclusively live and where RISC-V has near-exclusively lived up until quite recently. RISC-V has been great for RTOS systems and Espressif in particular has pushed MCUs up to a new level where it's become viable to run a designed-from-scratch web server (you better believe we're using vector graphics) on a $5 board that sits on your thumb, but using RISC-V in SBCs and beyond as the primary CPU is a very different ballgame.
I have a couple c3 I was playing with. Are you talking about the P4 or C6? Aren't their xtensa offerings still faster?
It's not the wrong design; RISC-V is designed around extensions, and they left room in the instruction encoding for them. They don't have a 800-lb gorilla like Intel shoving the ISA down customers' throats (Canonical is the closet thing) so there is some debate on which combination of extensions are needed for desktop apps.
FWIW I wrote this article a while back all about RISC-V extensions and how they work at a low level: https://research.redhat.com/blog/article/risc-v-extensions-w... page 22 in this PDF: https://research.redhat.com/wp-content/uploads/2023/12/RHRQ_...
> They don't have a 800-lb gorilla like Intel shoving the ISA down customers' throats

Nobody really forces you to use x64 if you don't like it, just as nobody forced you to use Itanium — which Intel famously failed to "shove down the customers' throats" btw.

It is a reduced instruction set computing isa of course. It shouldn't really have instructions for every edge case.

I only use it for microcontrollers and it's really nice there. But yeah I can imagine it doesn't perform well on bigger stuff. The idea of risc was to put the intelligence in the compiler though, not the silicon.

> It shouldn't really have instructions for every edge case.

Depends on what the instruction does. If it goes through a four-loads-four-stores chain that VAXen could famously do (with pre- and post-increments), then sure, this makes it impossible to implements such ISA in a multiscalar, OOO manner (DEC tried really, really hard and couldn't do it). But anything that essentially bit-fiddles in funny ways with the 2 sets of 64 bits already available from the source registers, plus the immediate? Shove it in, why not? ARM has bit shifted immediates available for almost every instruction since ARMv1. And RISC-V also finally gets shNadd instructions which are essentially x86/x64's SIB byte, except available as a separate instruction. It got "andn" which, arguably, is more useful than pure NOT anyway (most uses of ~ in C are in expressions of "var &= ~expr..." variety) and costs almost nothing to implement. Bit rotations, too, including rev8 and brev8. Heck, we even got max/min instructions in RISC-V because again, why not? The usage is incredibly widespread, the implementation is trivial, and makes life easier both for HW implementers (no need to try to macrofuse common instruction sequences) and the SW writers (no need to neither invents those instruction sequences and hope they'll get accelerated nor read manufacturers datasheets for "officially" blessed instruction sequences).

As proven by x86/x64 and ARM evolution, being all in into pure RISC doesn't pay off, because there is only so much compilers can do in a AOT deployment scenario.
> The idea of risc was to put the intelligence in the compiler though, not the silicon.

Itanium did this mistake. Sure, compilers are much better now, but still dynamic scheduling beats static one for real-world tasks. You can (almost perfectly) statically schedule matrix multiplication but not UI or 3D game.

Even GPUs have some amount of dynamic scheduling now.

It was kind of an experiment from start. Some ideas turned out to be good, so we keep them. Some ideas turned out not to be good, so we fix them with extensions.
The problem with hardware expirements is that people owning the hardware are stuck with experiments.
Sure, but if you bought a dev board with an experimental ISA I think you knew what you were getting in to.
If your hardware is new, you get the nicest extensions though. You just don’t use the bad parts in your code.
Sure, if you are developing software for the computer you own, instead of supporting everyone.
Re-compile?
I mean, that is often what you do in embedded computing: you (re)sell hardware with one particular application.
It's hard to imagine a student putting together a RVA23 core in a single semester. And you don't really want that in the embedded roles RISC-V has found a lot of success in either.
Relatively new, we're about 16 years down the road.
16 years from the START of getting an idea "why don't we make a new ISA?".

Less than 7 years from ratification of the initial RV{32,64}GC spec.

Less than 5 years from the first mass-produced roughly original Raspberry Pi level $100 SBC: AWOL Nezha, shipped June 2021.

Intentionally. Back then the guys were telling that everything could be solved by raw power.
>Misaligned loads and stores are Zicclsm

Nope. See https://github.com/llvm/llvm-project/issues/110454 which was linked in the first issue. The spec authors have managed to made a mess even here.

Now they want to introduce yet another (sic!) extension Oilsm... It maaaaaay become part of RVA30, so in the best case scenario it will be decades before we will be able to rely on it widely (especially considering that RVA23 is likely to become heavily entrenched as "the default").

IMO the spec authors should've mandated that the base load/store instructions work only with aligned pointers and introduced misaligned instructions in a separate early extension. (After all, passing a misaligned pointer where your code does not expect it is a correctness issue.) But I would've been fine as well if they mandated that misaligned pointers should be always accepted. Instead we have to deal the terrible middle ground.

>atomic memory operations are made mandatory in Ziccamoa

In other words, forget about potential performance advantages of load-link/store-conditional instructions. `compare_exchange` and `compare_exchange_weak` will always compile into the same instructions.

And I guess you are fine with the page size part. I know there are huge-page-like proposals, but they do not resolve the fundamental issue.

I have other minor performance-related nits such `seed` CSR being allowed to produce poor quality entropy which means that we have bring a whole CSPRNG if we want to generate a cryptographic key or nonce on a low-powered micro-controller.

By no means I consider myself a RISC-V expert, if anything my familiarity with the ISA as a systems language programmer is quite shallow, but the number of accumulated disappointments even from such shallow familiarity has cooled my enthusiasm for RISC-V quite significantly.

RISC-V truly is the RyanAir of processors: Oh, you want FP maths? That's an optional extra, did you check that when you booked? And was that single or double-precision, all optional extras at an extra charge. Atomic instructions, that's an extra too, have your credit card details handy. Multiply and divide? Yeah, extras. Now, let me tell you about our high-end customer options, packed SIMD and user-level interrupts, only for business class users. And then there's our first-class benefits, hypervisor extensions for big spenders, and even more, all optional extras.
So it's modular. This is normally considered a good thing. It means you don't have to pay for features you don't need.

The ISA is open so there's no greedy corporation trying to upsell you. I mean there's an implementation and die area cost for each extension but it's not being set at an artificial level by a monopolist.

There's a good chance you're actually paying more for the features you don't need. Preparing an EUV mask set costs something like 30 million dollars (that figure may be out of date, i.e. it could be more now). So instead of a single mask set with everything on the device, whether you need it or not, you're paying $30 million for each special-snowflake variant. This is why vendors do a one-size-fits-all version of many of their products and then disable the extra functionality for the cheaper market segments, because it's much, much cheaper than making separate reduced-functionality devices.
It's a good thing in many cases but not if you're going to be running applications distributed as binaries. Maybe if we go the Gentoo route of everybody always recompiling everything for their own system?
Then you stick to RVA23, which is comparable to ARMv9 and x86-64v4.
RVA23 is, finally, the belated admission that maybe we shouldn't have everything as optional extras. Hopefully it'll take off, I can't imagine what sort of a headache it is for maintainers of repos who have to track a dozen different variants of binaries depending on which flavour of RISC-V the apt-get is coming from.
But that means a port of Linux can’t be to RISC-V, it has to be to a specific implementation of RISC-V, or if sufficient (which seems still debatable) to a specific common RISC-V profile.
>which seems still debatable

In what way are RISC-V profiles debatable? Canonical is spearheading the RVA23-as-a-default movement and so far, it seems that there are no heavy objections towards that effort (beyond the usual "Canonical sucks" shtick that you see in every discussion involving Canonical)

You can target the minimum instruction set and it'll run everywhere. Albeit very slowly. Perhaps you use a fat binary to get reasonable performance in most cases.

This isn't easy but it can be done (and it is being done on x86, despite constantly evolving variations of AVX).

Interestingly, RISC-V vector extensions are variable length.

So, you can compile your RISC-V software to require the equivalent of AVX and it will run on whatever size vectors the hardwre supports.

So, on x86-64, if I write AVX2 software and run it on AVX512 capable hardware, I am leaving performance on the table. But if I write software that uses AVX512, it will not run on hardware that does not support those extensions (flags).

On RISC-V, the same binary that uses 256 bit vectors on hardware that only supports that will use 512 bit vectors on hardware that supports it, or even 1024 bit vectors on hardware like the A100 cores of the SpacemiT K3.

So, I guess X86-64 is is the RyanAir of processors.

I don't agree with that comparison.

RyanAir is about exploiting consumers, with bait-and-switch and shitty terms and conditions.

RISC-V's modularity is about giving choice to hardware designers, so they can pick and choose just those features that their solution needs, and even allow for custom extensions.

RISC-V's modularity is for academia. 1) for education, where students learn/use/work on simple processors, 2) for research in new types of hardware and extensions, where ease of implementation or ease of creating a custom extension is important.

Extensiosn are not just for academia. If I am building a microcontroller to control the storage media I am selling (eg. hard drives), why do I need to implement a bunch of features I am not going to use? What about my flow rate monitor? Or my pacemaker?

In some of these, less silicon means less power means more better. Like that last example.

Then x86_64 is the cable television service of processors. "Oh, you want channel 5? Then you have to buy this bundle with 40 other channels you will never watch, including 7 channels in languages you do not speak."
>Multiply and divide

And where it actually mattered they did not introduce a separate extension. Integer division is significantly more complex than multiplication, so it may make sense for low-end microcontrollers to implement in hardware only the latter.

There is Zmmul for multiplication-but-not-divide.
RyanAir is the least expensive right? And it still gets you there?

I would be ok with that if it was a valid analogy.

It is valid in microcontroller land. There, the chip and the software are provided by the same party. So you can select for exactly the RISC-V features you need and save yourself some silicon. That sounds like a win to me.

At the application level, like a server or a desktop, that would be a disaster because I get my hardware and software from different people. How do the software guys know what hardware to target? Well, that is exacly why RVA23 exists.

What does RVA23 mean? It is the RISC-V "Application" profile. It allows you to build software to a single hardware target and trust that hardware makers will target the same proifle. RVA23 is like saying x86-64v4. Both are simple names for a long list of extensions (flags) and assumptions that you expect the hardware to honour. So, when Ubuntu 26.04 says it requires RVA23, it means that all the software built on it can assume those features. No a la carte.

The reason RVA23 is geting so much attention is that it has essentially the same feature set as modern ARM64 or x86-64. Software will be able to target this profile for a long time. There may be a new profile in a few years time, like RVA30, but hardware that implements that will still run RVA23 software (just as x86-64v4 hardware will run x86-64v1 software). Hardware built for profiles before RVA23 may be missing features modern applications expect.

I guess you could say that RVA23 is British Airways Business Class.

If you really want to support hardware designed before RVA23, almost everything you would want to run pre-built software on supports RVA20. And again, your RVA20 stuff will run fine on RVA23 hardware (but with fewer features--like no vectors). So maybe no in-flight meal, but it will get you there.

Yes, adding instructions to your ISA has a cost
I think having separate unaligned load/store instructions would be a much worse design, not least because they use a lot of the opcode space. I don't understand why you don't just have an option to not generate misaligned loads for people that happen to be running on CPUs where it's really slow. You don't need to wait for a profile for that.

As for `seed`, if you're running on a microcontroller you can just look up the data sheet to see if it's seed entropy is sufficient. By the time you get to CPUs where portable code is important a CSPRNG is probably fine.

I agree about page size though. Svnapot seems overly complicated and gives only a fraction of the advantages of actually bigger pages.

>As for `seed`, if you're running on a microcontroller you can just look up the data sheet to see if it's seed entropy is sufficient.

It's a terrible attitude to have towards programmers, but looking at misaligned ops, I guess we can see a pattern from RISC-V authors here.

Most programmers do not target a concrete microcontroller and develop every line of code from scratch. They either develop portable libraries (e.g. https://docs.rs/getrandom) or build their projects using those libraries.

The whole raison d'être of an ISA is to provide a portable contract between hardware vendors and programmers . RISC-V authors shirk this responsibility with "just look at your micro specs, lol" attitude.

The option to generate or not generate misaligned loads/stores does exist (-mno-strict-align / -mstrict-align). But of course that's a compile-time option, and of course the preferred state would be to have use of them on by default, but RVA23 doesn't sufficiently guarantee/encourage them not being unreasonably-slow, leaving native misaligned loads/stores still effectively-unusable (and off by default on clang/gcc on -march=rva23u64).

aka, Zicclsm / RVA23 are entirely-useless as far as actually getting to make use of native misaligned loads/stores goes.

The cursed thing is that RVA23 does basically guarantees that `vle8.v` + `vmv.x.s` on misaligned addresses is fast.
Yeah, that is quite funky; and indeed gcc does that. Relatedly, super-annoying is that `vle64.v` & co could then also make use of that same hardware, but that's not guaranteed. (I suppose there could be awful hardware that does vle8.v via single-byte loads, which wouldn't translate to vle64.v?)
> RVA23 doesn't guatantee them not being unreasonably-slow

Right but it doesn't guarantee that anything is unreasonably slow does it? I am free to make an RVA23 compliant CPU with a div instruction that takes 10k cycles. Does that mean LLVM won't output div? At some point you're left with either -mcpu=<specific cpu> and falling back to reasonable assumptions about the actual hardware landscape.

Do ARM or x86 make any guarantees about the performance of misaligned loads/stores? I couldn't find anything.

Exactly, I 100% agree, and IMO toolchains should default to assuming fast misaligned load/store for RISC-V.

However, the spec has the explicit note:

> Even though mandated, misaligned loads and stores might execute extremely slowly. Standard software distributions should assume their existence only for correctness, not for performance.

Which was a mistake. As you said any instruction could be arbitrarily slow, and in other aspects where performance recommendations could actually be useful RVI usually says "we can't mandate implementation".

I don't think x86/ARM particularly guarantee fastness, but at least they effectively encourage making use of them via their contributions to compilers that do. They also don't really need to given that they mostly control who can make hardware anyway. (at the very least, if general-purpose HW with horribly-slow misaligned loads/stores came out from them, people would laugh at it, and assume/hope that that's because of some silicon defect requiring chicken-bit-ing it off, instead of just not bothering to implement it)

Indeed one can make any instruction take basically-forever, but I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation (on at least some inputs), else having said instruction is strictly-redundant.

And if any significant general-purpose hardware actually did a 10k-cycle div around the time the respective compiler defaults were decided, I think there's a good chance that software would have defaulted to calling division through a function such that an implementation can be picked depending on the running hardware. (let's ignore whether 10k-cycle-division and general-purpose-hardware would ever go together... but misaligned-mem-ops+general-purpose-hardware definitely do)

> if general-purpose HW with horribly-slow misaligned loads/stores came out from them

How is that different for RISC-V?

> I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation

I agree! So just use misaligned loads if Zicclsm is supported. As you observed there's a feedback loop between what compilers output and what gets optimised in hardware. Since RVA23 hardware is basically non-existent at the moment you kind of have the opportunity to dictate to hardware "LLVM will use misaligned accesses on RVA23; if you make an RVA23 chip where this is horribly slow then people will laugh at you and assume it's some sort of silicon defect".

RISC-V is not particularly good at using opcode space, unfortunately.
I don't think it's too bad. The compressed extension was arguably a mistake (and shouldn't be in RVA23 IMO), but apart from that there aren't any major blunders. You're probably thinking about how JAL(R) basically always uses x1/x5 (or whatever it is), but I don't think that's a huge deal.

About 1/3 of the opcode space is used currently so there's a decent amount of space left.

What about page size?
It's 4k on x86 as well. Doesn't seem to hurt so bad -- at least, not enough to explain the risc-v performance gap.
Hmm? x86 has supported much larger “huge” page sizes for ages.
Yep, RISC-V also has these megapages. 4k is the last-level page size. You get larger pages (4M on 32-bit and 2M/1G on 64-bit) by terminating the walk at higher levels of the page table.
Yes, and Linux. at least historically, has not used them without explicit program opt-in. Often advice is to disable transparent huge pages for performance reasons. Not sure about other operating systems.

See, for example, https://www.pingcap.com/blog/transparent-huge-pages-why-we-d...

Huh, no? The usual advice is to enable THPs for performance, you only disable them in specific scenarios.
x86 has decades of knowhow and a zillion transistors to spend on making the memory pipeline, TLB caching & prefetching etc. etc. really really good. They work as well as they do despite the 4k base page size, not because of it.

If you'd start from a clean sheet today you'd probably end up with a somewhat bigger base page size. Not hugely larger though, as that wastes a lot of memory for most applications. Maybe 16k like some ARM chips use?

RISC-V has the Svnapot extension for large page sizes https://riscv.github.io/riscv-unified-db/manual/html/isa/isa...
You're correct but I guess my thoughts are if we're going to wind up with a mess of extensions, why not just use x86-64?
First, x86-64 also has “extensions” such as avx, avx2, and avx512. Not all “x86-64” CPUs support the same ones. And you get things like svm on AMD and avx on Intel. Remember 3DNow?

X86-64 also has “profiles” which tell you what extensions should be available. There is x86-64v1 and x86-64v4 with v2 and v3 in the middle.

RVA23 offers a very similar feature-set to x86-64v4.

You do not end up with a mess of extensions. You get RVA23. Yes, RVA23 represents a set of mandatory extensions. The important thing is that two RVA23 compliant chips will implement the same ones.

But the most important point is that you cannot “just use x86-64”. Only Intel and AMD can do that. Anybody can build a RISC-V chip. You do not need permission.

It's actually worst because intel is introducing APX now as well.
>Anybody can build a RISC-V chip. You do not need permission.

No, anybody can’t build a RISC-V chip. That’s the same mistake OSS proponents make. Just because something is open source doesn’t mean bugs will be found. And just because bugs are found doesn’t mean they will be fixed. The vast majority of people can’t do either.

The number of people who can design a chip implementation of the RISC-V ISA is much, much smaller, and the number who can get or own a FAB to manufacture the chips smaller still. You don’t need permission to use the ISA, but that is not the only gate.

I think it was clear that they were saying anybody is permitted to build a RISC-V chip, not that anybody has the skills.

> The number of people who can design a chip implementation

Thankfully you don't have to start from scratch. There are loads of open source RISC-V chip implementations you can start from.

> get or own a FAB to manufacture the chips

There is always FPGAs and also this:

https://fossi-foundation.org/blog/2020-06-30-skywater-pdk

> anybody can’t build a RISC-V chip

Yes, they can. My point is that nobody needs to give you permission. You can pretend that does not matter but China is about to educate us about what this means rather dramatically in the next few years.

And India is building RISC-V chips. And Europe is building RISC-V chips. Tenstorrent started in Canada (building RISC-V chips).

> the number who can get or own a FAB to manufacture the chips

Really? Almost nobody owns fabs and yet there are a multitude of chip makers. Getting access to a fab requires only money. It has nothing to do with the ISA or your skills. TSMC can make RISC-V chips just fine and already do. In some places, like China, RISC-V chips may be at the front of the line.

> The number of people who can design a chip implementation of the RISC-V ISA

Anybody can build a RISC-V chip. Build one yourself: https://github.com/tscheipel/HaDes-V

Every electrical engineer is going to know how to design a RISC-V chip. But you could also be an intelligent garbage man and design a RISC-V chip in your spare time using only open source materials. You can even tape it out.

https://tinytapeout.com/

"But that is only a 32 bit microcontroller!", you might say. Sure. But the skills to build RISC-V are going to propogate. Of course, that does not mean that everybody in the world is going to figure out how to build chips. That is clearly not my point. They will still be built primarily by a select few. But that is not unique to RISC-V by any stretch. In fact, less so.

The hard part about building a chip from scratch is not the ISA. You think that a world-class engineer working with ARM64 or amd64 today cannot design a RISC-V chip? That is like saying a carpenter building oak cabinets lacks the skills to make them with maple.

And since it is the same amount of work to start fresh regardless of ISA, why not start with RISC-V?

Except you do not have to start fresh with RISC-V because there are many, and will be many, many more, open designs to study and start with. Here is a 64 bit chip that implements the very latest RISC-V vector extensions:

https://github.com/tenstorrent/riscv-ocelot

Which, by the way, means that although most won't, anybody can build a RISC-V chip.

The RISC-V world will look like ARM. Most chip makers will license the core design off somebody else. But there will be more of those "somebody elses" to choose from. And there will be more people who choose to design their own silicon. Meta just bought Rivos. What for do you think? And they did not have to talk to ARM about it.

1. Yes, but most of the code would run on anything older than 2007. 20 years of stable ISA.

2. Also, fundamentally all modern CPUs are still 64-bit version of 80386. MMU, protection, low level details are all same.

This isn't really accurate, lots of commercial software is now compiled for newer x86 64 extensions.

If you're using OSS it doesn't really matter as you can compile it for whatever you want.

> lots of commercial software is now compiled for newer x86 64 extensions.

Almost all software I encountered - including Windows 10 and precompiled Debian 13 - needs only SSE4.2, essentially mid-2000s ISA. Intel produced until very recently (early 2020s) Celeron CPUs which did not even support AVX.

People focus on AVX entirely too much, it is stuff like POPCNT that matters more. Which as you pointed out, is part of SSE4.2
No, you really can’t. For some OSS, on hardware that has an OS supported by that software, with a compiler that supports that target and the options you want, and in some cases where the OSS has been written to support those options, you can compile it. Otherwise you are just out of luck.
I don't really understand your position here. Compiler availability isn't really that big of a deal, even on obscure or proprietary platforms. Why would there be "some cases where the OSS has been written to support those options"?
Because the ISA is not encumbered the way other ISAs are legally, and there are use cases where the minimal profile is fine for the sake of embedded whatever vs the cost to implement the extensions
> why not just use x86-64?

Uh, because you can't? It's not open in any meaningful sense.

The original amd64 came out in 2003. Any patents on the original instruction set have long expired, and even more so for 32-bit x86.
Its not about patents. Believe what you want but there is a reason nobody else is doing x86 or ARM chips unless they are allowed by the owner.
You're probably right. It would be helpful to say what the reason is, if it's not patents.
I'm not a lawyer but I would assume its copyright. Kind of like API in software. In software somehow this does not apply most of the time. But it seems in hardware this is very real. But I would appreciate a lawyer jumping in.

I know for example that Berkley when thinking pre-RISC-V that they had a deal with Intel about using x86-64 for research. But they were not able to share the designs.