Hacker News new | ask | show | jobs
RISC-V: The New Architecture on the Block (klarasystems.com)
45 points by johnblood 1730 days ago
4 comments

At least one of the points in this article isn't true. The article states "RISC-V is not affected by the Meltdown and Spectre vulnerabilities... [because it does] not perform any speculative memory accesses" -- but that's only true for some designs, not anything done by the ISA.

Many of the higher performance RISC-V designs do, in fact, do speculation. RISC-V BOOM[0], by Berkeley, is vulnerable to Spectre[1][2]. One of the attempts to create an extension to the RISC-V ISA that has integrated security features (CHERI, [3]) itself was shown to be vulnerable to Spectre-like attacks[4].

The fact that most RISC-V chips were not vulnerable to Spectre is simply because they hadn't implemented a particular kind of performance optimization, not because there was anything intrinsic to the ISA that prevented them from being so.

[0]: https://boom-core.org/

[1]: https://github.com/abejgonzalez/boom-attacks

[2]: https://boom-core.org/docs/replicating_mitigating_spectre_ca...

[3]: https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri...

[4]: https://kth.diva-portal.org/smash/get/diva2:1538245/FULLTEXT...

What you say is true, but it's not the end of the story. Higher performance RISC-V CPU cores that do speculation are under development or in some cases have been announced but not yet shipped. They have the opportunity to design to avoid those vulnerabilities from the start, which is a lot easier than fixing things up afterwards -- or simply losing a lot of performance on already-shipped CPUs by turning features off.

It's not actually very hard to avoid these kids of vulnerabilities. All that is needed is to not permanently update state until the instruction is no longer speculative.

For example:

- don't update the branch prediction tables until the branch is proven to execute

- if loading a cache line causes another cache line to be evicted, keep both until it is known whether the load is supposed to execute

This requires provisioning a few more of these kinds of resources than you might previously have had, which costs a little silicon area, but it doesn't cost speed. Sometimes particularly demanding code might cause you to run out of these speculation resources and then you have to stall until an entry is freed up. This can already happen with things such as store buffers. If it never happens then you've probably over-provisioned :-)

> - don't update the branch prediction tables until the branch is proven to execute

> - if loading a cache line causes another cache line to be evicted, keep both until it is known whether the load is supposed to execute

This sounds interesting. How do these countermeasures actually prevent Spectre V1? And if they do, why didn't Intel implement them? Seems like they are fully microarchitectural, and therefore opaque to the software world.

All Spectre/Meltdown attacks go through the same 3 steps in this order:

1) taking control over transient instruction execution

2) controlled transient instructions access a legal data (but illegal for us)

3) controlled transient instructions exfiltrate this data through a side channel between the microarch and the arch

For Spectre V1: Step 1) is performed by the Branch-Predictor, step 2) depends on the gadget targeted within the victim code, and step 3) is completed using a FLUSH+RELOAD or a EVICT+RELOAD triggered by a transient load.

If one of these 3 steps is not met then the attack is impossible. The brucehoult proposal (obviously not the first to suggest this) is to eliminate step 3): if no transient execution side effect/microarchitectural state is made observable, then there is no way to exfiltrate data. All Spectre/Meltdown attacks are therefore made unfeasible.

The problem is that brucehoult's proposal does not guarantee that all side-channels are infeasible at all, it only guarantees that side channels based on branch prediction or caching are no longer possible.

Furthermore, microarchitectural optimizations are made to have an observable effect on the execution time. Therefore, it's likely that other timing side-channels will be exposed/discovered/used.

Yes, it is a fundamentally misleading assertion.
I made that statement in the article because that is what my research showed.
Lately I find myself upvoting every submission from Klara Systems[0]. Focus of the articles seems to be consistently good. I'm not affiliated with them, just an impressed onlooker.

[0] https://news.ycombinator.com/from?site=klarasystems.com

I do not. This article is fluff, with at least one glaring falsehood noted in a sibling comment.
I’ve recently gained an interest in computer architecture and systems programming in general after some high performance computing projects. Have been hearing about RISC-V from a distance, and am wondering if delving into it would be one of the best ways to satisfy this interest and gain some valuable knowledge/skills at the same time.
RISC-V is an OK design, but quite atypical. So, being simpler than others, it will be easier to understand; but for the same reason, does not acquaint you with details seen in designs currently used industrially.

Some of the design decisions, and their expressed rationale, are considered unpersuasive by many involved with other architectures. For example, a status register, cited as interfering with optimal out-of-order execution, turns out not to be a problem in actual chips (where they rename it like other registers), so was omitted from the RISC-V design on what amounts to superstition. Some instruction sequences that would need to be "fused" to match performance of common chips involve many more instructions than are fused in any extant design, so it is unclear that such fusion would be practically achievable.

If you're working on a student or hobby project then simplifying the scoreboard or whatever OoO scheduling structure you use by having all instructions be 2 inputs and 1 output is a big help. Compared to the complexity of a modern core that's a drop in the bucket but for a single person or a few friends or grad students it can be a big deal. And by doing that project you're still learning the important things. So for someone in the position OP is in I'd certainly recommend RISC-V. Don't ask "Is RISC-V good or bad?" but rather "For what purposes is RISC-V suited?"
A better question is, "Given RISC-V, what mistakes can we avoid next time around?"

They have been Turing complete from the first, so the differences are limited to speed, power consumption, and incidentals.

Newsflash: people who made different decisions on their own designs think they made the right decision. Update at 9.

ISAs without condition codes have been around for a long time, and very technically successful. MIPS and DEC Alpha, for example. (both killed by clueless management, not any technical issue)

The vast vast majority of condition code updates are either never used at all or are used by the very next instruction. In either case, there is little point in reifying them and no point in saving them.

Generating a condition a few instructions before it is used happens from time to time, at least on ISAs where only some instruction types update the condition codes, or there is a flag in the instruction to indicate whether to or not. An ISA without condition codes but with plenty of registers can do the same thing using SLT/SLTU (Set if Less Than [Unsigned]) to generate a 0 or 1 in a normal register. Or a simple XOR or SUB for equality tests.

Historically, use of condition codes is because your instructions aren't big enough to contain two source operands, a test, and a reasonable branch offset. Now it's because you're descended from such an ISA.

Similarly, many early ISAs did conditional skip instead of conditional branch because their instructions weren't big enough to test a condition and also hold a useful branch offset. Some of them could integrate a compare with the skip, but some of them needed three instructions: compare -> CCs; skip based on CCs; jump. Not high performance.

Compare and branch, all in one instruction, is best most of the time if you have the opcode space for it.

People who have designed other systems can point out many of their own mistakes, too. Being unable to spot any should inspire less confidence, not more.

Setting a 0 or 1 in SLT was another design error. People designing GPUs demonstrate that they know the better design sets a 0 or ~0 (all ones).

Huge instructions have been regretted enough to motivate abbreviated versions. Even in RISC-V.

And, as has already been noted in this forum, lack of a reliably available popcount instruction has been subsequently corrected, at great expense, practically everywhere.

All of which really only means I'm ready for Risc-6. With some care, it should be able to re-use much of the ecosystem work from RISC-V.

> Setting a 0 or 1 in SLT was another design error. People designing GPUs demonstrate that they know the better design sets a 0 or ~0 (all ones).

Going from 0/1 to 0/~0 (or conversely) just takes a NEG instruction. All in all, it's a trivial difference. And it's hard to say what's more convenient in actual code.

"Just takes" is another way to say "takes". And, when all your instructions are at least 4 bytes, that takes another 4 bytes.
I completely agree with you on 0/1 vs 0/-1 for SLT and you can easily find me saying so. For example: https://lists.riscv.org/g/tech-bitmanip/message/496

It was an error, though a rather minor one, to follow the C language so closely. I can and have pointed out other minor mistakes in RISC-V in the past -- none of them serious enough to abandon it and start over.

I'll quote myself from there, below.

32 bits is not such a huge instruction. ARM decided it's good enough for their new(ish) 64 bit ISA, and it's about the average size of x86_64 instructions.

Original RISC-V (v1.0) has only and exactly the instructions needed to implement C. That's enough for many or most applications, and will be available as a support option forever. The upcoming RVA22 specification for Applications Processors, which will be ratified before the end of the year includes an SVE-like vector extension and also Bit Manipulation extensions (along with many others). The Zbb (Basic bit-manipulation) extension includes cpop along with clz and ctz and rotate. There is also andn, orn, xnor, max, maxu, min, minu, sext.b, sext.h, zext.h, and rev8 (reverse bytes in a register). Plus a unique instruction orc.b which replaces any non-zero byte in the source operand with all ones. There is also scalar crypto and cache manipulation (prefetch, flush etc).

Perhaps RVA22 is your hypothetical Risc-6.

-----

There are five reasons you might use SLT / SLTU, in (I think) descending order of how common they are, and the implications had -1 been used instead of 1:

1) to generate a zero/non-zero value. No difference.

2) to generate a mask. Using 0 and -1 is better, saving a NEG or a subtract 1, depending on whether you reverse the condition or not.

3) to generate a value that can be AND / OR / XOT etc with other such values. No difference.

4) to assign to a canonical C/C++ true/false, or mix with them using AND / OR / XOR. Worse -- have to do an ANDI #1 before using the final result.

5) to generate a canonical C true/false and add or subtract it from something. No difference. Just flip add to subtract or vice versa.

Interestingly, a time when you do want 0 or 1 is the examples in the original superoptimiser paper from 1987.

https://web.stanford.edu/class/cs343/resources/superoptimize...

They first considered the function:

  int signum (int x) {
    if(x > 0) return I;
    else if(x < 0} return -I;
    else return 0;
  ) 
They showed the superoptimiser finding the following unexpected 68020 sequence, making use of the carry flag:

  (x in dO)
  add.l d0,d0 ;add dO to itself
  subx.l dl,dl ;subtract (dl + Carry) from dl
  negx.l dO  ;put (0 - dO - Carry) into dO
  addx.l dl,dl ;add (dl + Carry) to dl
  (signum(x) in dl} (4 instructions} 
This is much more straightforward on RISC-V:

  (x in a0)
  slt a1,a0,zero  # a1 = 1 if x is negative, 0 if 0 or positive
  slt a0,zero,a0 # a0 = 1 if x is positive, 0 if 0 or negative
  sub a0,a0,a1 # 1-0 = 1 if positive, 0-0 = 0 if zero, 0-1 = -1 if negative
-----
> This is much more straightforward on RISC-V

AIUI, if SLT returns 0 or -1 you can then reverse the arguments to SUB and get a correct result. If you return the result in a1 you can also keep the 2-operand compressed form of SUB, so there's effectively no difference. Equivalently, you can keep the SUB insn unchanged (thus using a 2-operand form to return in a0) while flipping the previous SLT insns: SLT a1, zero, a0; SLT a0, a0, zero.

True, it would be better if they had defined RISC-V's C and C++ ABI to make 'true' physically equal to -1, negating or adding 1 to it when actually necessary to treat it as an int value. That would be rare.

The very late addition of the reified B extensions (and others) will be a continuing problem, as builds will not be able to count on them having been implemented. (Trap emulation would be much worse than useless.) The lack of rotate operations in the base instruction set is a problem for implementing modern encryption systems. On embedded chips likely to appear in routers and switches, "extensions" such as the Bs are especially likely to be omitted.

It would not be necessary to abandon the work on RISC-V to do a Risc-6. Most of the work done could carry over.

I saw mask generation use case only once for constant time ternary operator in cryptography, but I use booleans regularly - your use case 4. In case of cryptography it's x25519 algorithm, but the condition controlling the conditional swap is a bit extracted from the private key and the algorithm uses 255 bits of the key sequentially.
The "1980's called" issue that always gets to me is the absence of arithmetic overflow (or divide by 0) detection. You end up having to generate several additional instructions if you want to catch those errors (-trapv in gcc). On other architectures that's done with condition flags or traps.

This doesn't apply to floating point, where condition flags (confusingly called exceptions in IEEE parlance) are mandated by IEEE 857 and afaik Risc-V implements them conformantly. I don't see why they couldn't also do something like that for integers.

I guess you use this mode when your application is IO bound.
Depends on your background I guess? If you've taken a course like Computation Structures sure!

https://ocw.mit.edu/courses/electrical-engineering-and-compu...

That class looks amazing. Took a higher level computer systems course in undergrad that roughly followed http://csapp.cs.cmu.edu/ (great course).
Will be nice to see some affordable motherbords and processors with this architecture.