Hacker News new | ask | show | jobs
by codedokode 1103 days ago
RISC-V is copying wrong decisions made tens years ago. Against any common sense it doesn't trap on overflow on arithmetic operations, and silently wraps number over, producing incorrect result. Furthermore, it does not provide an overflow flag (or any alternative) so it is difficult to make addition of 256-bit numbers for example.
2 comments

It doesn't trap because trapping means you need to track the possibility of a branch at every single arithmetic operation. It doesn't have a flag so flag renaming isn't needed: you can get the overflow from a CMP instruction and macroop fusion should just work.
> you need to track the possibility of a branch at every single arithmetic operation

Every memory access can cause a trap, but CPUs seem to have no problem about it. The branch is very unlikely and can always be predicted as "not taken".

Hell, with non-maskable interrupts, any instruction can cause a trap!
Not even that - instruction fetch can cause a page fault. When an NMI happens,the CPU still has the choice of when to service it. If it needs to flush the pipeline, it might as well retire the instructions up to the first store.
Managing memory coherency is probably the single hardest part to design in any given CPU. Why add even more hard things (especially if they can interact and add even more complexity on top)?

Get rid of what complexity you can then deal with the rest that you must have.

Coherency is very hard but it's not what causes traps from accessing memory. That part is a relatively simple permission check.
I love how each and every criticism on RISC-V's decisions ignores the rationale behind them.

Yes, that idea was evaluated, weighted and discarded as harmful, and the details are referenced in the spec itself.

I tried searching the spec [1] for "overflow" and here is what it says at page 17:

> We did not include special instruction-set support for overflow checks on integer arithmetic operations in the base instruction set, as many overflow checks can be cheaply implemented using RISC-V branches.

> For general signed addition, three additional instructions after the addition are required

Is this "cheap", replacing 1 instruction with four? According to some old mainframe era research (cannot find link now), addition is one of the most often used instructions and they suggest that we should replace each instruction with four?

Their "rationale" is not rational at all. It doesn't make sense.

Overflow check should be free (no additional instructions required), otherwise we will see the same story we have seen for last 50 years: compiler writers do not want to implement checks because they are expensive; language designers do not want to use proper arithmetic because it is expensive. And CPU designers do not want to implement traps because no language needs them. As a result, there will be errors and vulnerabilities. A vicious circle.

What also surprises me is that they added fused add-multiply instruction which can easily be replaced by 2 separate instructions, is not really needed in most applications (like a web browser), and is difficult to implement (if I am not mistaken, you need to read 3 registers instead of 2, which might require additional ports in register file only for this useless instruction).

[1] https://github.com/riscv/riscv-isa-manual/releases/download/...

So you are criticising RISC-V not compared to its actual x86 and Arm competition -- where overflow checking is also not free and is seldom used -- but against some imaginary ideal CPU that doesn't exist or no one uses because it's so slow.
> So you are criticising RISC-V not compared to its actual x86 and Arm competition -- where overflow checking is also not free and is seldom used

How do people do overflow checking on x86 and ARM in practice? For languages which implement it, such as Rust or Ada?

I know 32-bit x86 has the INTO instruction, which raises interrupt 4 if the overflow flag (OF) is set – but it was removed in x86-64, which gives me the impression that even languages which did do checked arithmetic weren't using it.

> but against some imaginary ideal CPU that doesn't exist

I'm not the person you are responding to, but to try to read their argument charitably (to "steelman" it) – if a person thinks checked arithmetic is an important feature, RISC-V's decision not to include it could be seen as a missed opportunity.

> or no one uses because it's so slow.

Is it inherently slow? Or is it just the chicken-egg problem of hardware designers feel no motivation to make it fast because software doesn't use it, meanwhile software doesn't use it because the hardware doesn't make it fast enough?

> How do people do overflow checking on x86 and ARM in practice? For languages which implement it, such as Rust or Ada?

> I know 32-bit x86 has the INTO instruction, which raises interrupt 4 if the overflow flag (OF) is set – but it was removed in x86-64, which gives me the impression that even languages which did do checked arithmetic weren't using it.

Languages still use the overflow flag, they just don't use interrupts. I'm most familiar with Rust, where if the program wants a boolean value representing overflow (e.g., with checked_* or overflowing_* operations), LLVM obtains that value using a SETO or SETNO instruction following the arithmetic operation. If the program just wants to branch on the result of overflow, LLVM performs it using a JO or JNO instruction. Overflow checks that crash the program (e.g., in debug builds) are implemented as an ordinary branch that calls into the panic handler.

> So you are criticising RISC-V not compared to its actual x86 and Arm competition -- where overflow checking is also not free

Do you suggest we should carry on bad design decisions made in the past? x86 is an exhibition of bad choices and I don't think we need to copy them.

> and is seldom used

I believe it is not like this. I think that in most cases you need non-wrapping addition, for example, if you are calculating totals for a customer's order, counting number of visits for a website, or calculating loads in a concrete beam.

Actually wrapping addition is the one that is seldomly used, in niche areas like cryptography. So it surprises me that the kind of addition that is used more often (non-wrapping) requires more instructions than exotic wrapping addition. What were CPU designers thinking I fail to understand.

You can't solve all the world's problems in one step. RISC-V solves a number of important problems, while making it as easy as practical to run existing software quickly on it.

If you want to have checked arithmetic, RISC-V's openness allows you to make a custom extension, implement hardware for it (FPGA is cheap and easy), implement software support and demonstrate the claimed benefits, and encourage others to also implement your extension, or standardise it.

It is simply not possible to do this in the x86 or Arm worlds. And that is one of the problems RISC-V solves -- a meta problem, not some one individual pet problem, but potentially all of them.

I agree that wrapping is a bad default, but I can provide some rationale.

If you do wrapped addition without flags, you have one self-contained instruction that even covers signed and unsigned integers. If you want other behaviour, you then have to specialize for signed or unsigned, specialize for the choice of wrap/trap/flag, and make those traps and flags work nicely with whatever other traps or flags you might have.

So, yeah, if you want the simplest possible thing, driven by some decision other than the best outcomes for software in general, then you would choose wrapping addition without flags or traps.

This seems like an oversimplification of how these things work. Every architecture is going to provide a way to do wrapping arithmetic. You seem to also want that there be dedicated instructions to check for overflow. Some architectures have this! But what happens in practice is that people are smarter than this and recognize that the number of instructions emitted is irrelevant if some of them are inherently slower than others. Compilers emit lea on x86-64 these days to save ports and you think they’ll use your faulting add that takes an extra cycle? Definitely not.

Anyways, this game is going to really end up won by people higher in the stack paying the price for bounds checks and including them no matter what, because not having them is not tenable for their usecase. This drives processor manufacturers to make these checks more efficient which they have been doing for many years.

> Compilers emit lea on x86-64 these days to save ports and you think they’ll use your faulting add that takes an extra cycle? Definitely not.

"Faulting" addition should be as fast as wrapping addition and take a single instruction. Yes, I want hardware-accelerated overflow checking because it leads to more accurate results and prevents security vulnerabilities.

By the way, I want FPU operations to cause traps too (when getting infinity or NaN).

But there’s inherently more work. You need to keep track of some extra state and when the overflow actually occurs you need to unwind processor state and deliver the exception. You can make this cheap but it definitely cannot be free. From the words you’re using I feel like you have a model in your head that if you can just encode something into an instruction it’s now fast and that instructions are the way we measure how “fast” something is, but that’s not true. Modern processors can retire multiple additions per cycle. What this will probably look like is both of them are single instructions and one of them has a throughput of 4/cycle and the other one will be 3/cycle and compiler authors will pick the former every time.
> Modern processors can retire multiple additions per cycle.

Then add multiple overflow checking units.

> one of them has a throughput of 4/cycle and the other one will be 3/cycle and compiler authors will pick the former every time.

Currently on RISC-V checked addition requires 4 dependent instructions, so its throughput is about 1 addition/cycle.

Comparisons of code compiled for x86 or RISC-V show that (on average), the RISC-V code is significantly smaller.

Any code size increases are made up for elsewhere and they STILL get smaller code too.

And, amusingly, the instruction count is also very competitive, especially inside loops.

Furthermore, it achieves all of that with a much simpler ISA that matches x86 and arm in features, while having an order of magnitude less instructions to implement and verify.

Compiler output is not a good way to show off the best of an ISA (which is more an indictment of how bad compilers actually are at optimising for code density). Look at the demoscene. x86 can be an order of magnitude denser than lame compiler output.

RISC-V wasn't around when this paper was written, but it's close enough to MIPS to disprove the claim that "RISC-V code is significantly smaller": https://web.eece.maine.edu/~vweaver/papers/iccd09/iccd09_den...

>Averages over large bodies of code do not matter

>Compiler output does not matter

>1987 paper

>RISC-V encoding "close enough to MIPS"

>disprove the claim that "RISC-V code is significantly smaller"

F for effort.

1987 paper

Did you even look at the link?

Neither shilling nor trolling is welcome here. Is there a relationship you haven't disclosed with RISC-V?

Bad decisions are evaluated & weighed, and often documented. I love the assumption that the RISC-V team is both infallible and immune to bias.
Great. Then confront the rationale, instead of dismissing it or pretending it is not there.