I tried searching the spec [1] for "overflow" and here is what it says at page 17:
> We did not include special instruction-set support for overflow checks on integer arithmetic operations in the base instruction set, as many overflow checks can be cheaply implemented using RISC-V branches.
> For general signed addition, three additional instructions after the addition are required
Is this "cheap", replacing 1 instruction with four? According to some old mainframe era research (cannot find link now), addition is one of the most often used instructions and they suggest that we should replace each instruction with four?
Their "rationale" is not rational at all. It doesn't make sense.
Overflow check should be free (no additional instructions required), otherwise we will see the same story we have seen for last 50 years: compiler writers do not want to implement checks because they are expensive; language designers do not want to use proper arithmetic because it is expensive. And CPU designers do not want to implement traps because no language needs them. As a result, there will be errors and vulnerabilities. A vicious circle.
What also surprises me is that they added fused add-multiply instruction which can easily be replaced by 2 separate instructions, is not really needed in most applications (like a web browser), and is difficult to implement (if I am not mistaken, you need to read 3 registers instead of 2, which might require additional ports in register file only for this useless instruction).
So you are criticising RISC-V not compared to its actual x86 and Arm competition -- where overflow checking is also not free and is seldom used -- but against some imaginary ideal CPU that doesn't exist or no one uses because it's so slow.
> So you are criticising RISC-V not compared to its actual x86 and Arm competition -- where overflow checking is also not free and is seldom used
How do people do overflow checking on x86 and ARM in practice? For languages which implement it, such as Rust or Ada?
I know 32-bit x86 has the INTO instruction, which raises interrupt 4 if the overflow flag (OF) is set – but it was removed in x86-64, which gives me the impression that even languages which did do checked arithmetic weren't using it.
> but against some imaginary ideal CPU that doesn't exist
I'm not the person you are responding to, but to try to read their argument charitably (to "steelman" it) – if a person thinks checked arithmetic is an important feature, RISC-V's decision not to include it could be seen as a missed opportunity.
> or no one uses because it's so slow.
Is it inherently slow? Or is it just the chicken-egg problem of hardware designers feel no motivation to make it fast because software doesn't use it, meanwhile software doesn't use it because the hardware doesn't make it fast enough?
> How do people do overflow checking on x86 and ARM in practice? For languages which implement it, such as Rust or Ada?
> I know 32-bit x86 has the INTO instruction, which raises interrupt 4 if the overflow flag (OF) is set – but it was removed in x86-64, which gives me the impression that even languages which did do checked arithmetic weren't using it.
Languages still use the overflow flag, they just don't use interrupts. I'm most familiar with Rust, where if the program wants a boolean value representing overflow (e.g., with checked_* or overflowing_* operations), LLVM obtains that value using a SETO or SETNO instruction following the arithmetic operation. If the program just wants to branch on the result of overflow, LLVM performs it using a JO or JNO instruction. Overflow checks that crash the program (e.g., in debug builds) are implemented as an ordinary branch that calls into the panic handler.
> So you are criticising RISC-V not compared to its actual x86 and Arm competition -- where overflow checking is also not free
Do you suggest we should carry on bad design decisions made in the past? x86 is an exhibition of bad choices and I don't think we need to copy them.
> and is seldom used
I believe it is not like this. I think that in most cases you need non-wrapping addition, for example, if you are calculating totals for a customer's order, counting number of visits for a website, or calculating loads in a concrete beam.
Actually wrapping addition is the one that is seldomly used, in niche areas like cryptography. So it surprises me that the kind of addition that is used more often (non-wrapping) requires more instructions than exotic wrapping addition. What were CPU designers thinking I fail to understand.
You can't solve all the world's problems in one step. RISC-V solves a number of important problems, while making it as easy as practical to run existing software quickly on it.
If you want to have checked arithmetic, RISC-V's openness allows you to make a custom extension, implement hardware for it (FPGA is cheap and easy), implement software support and demonstrate the claimed benefits, and encourage others to also implement your extension, or standardise it.
It is simply not possible to do this in the x86 or Arm worlds. And that is one of the problems RISC-V solves -- a meta problem, not some one individual pet problem, but potentially all of them.
I agree that wrapping is a bad default, but I can provide some rationale.
If you do wrapped addition without flags, you have one self-contained instruction that even covers signed and unsigned integers. If you want other behaviour, you then have to specialize for signed or unsigned, specialize for the choice of wrap/trap/flag, and make those traps and flags work nicely with whatever other traps or flags you might have.
So, yeah, if you want the simplest possible thing, driven by some decision other than the best outcomes for software in general, then you would choose wrapping addition without flags or traps.
This seems like an oversimplification of how these things work. Every architecture is going to provide a way to do wrapping arithmetic. You seem to also want that there be dedicated instructions to check for overflow. Some architectures have this! But what happens in practice is that people are smarter than this and recognize that the number of instructions emitted is irrelevant if some of them are inherently slower than others. Compilers emit lea on x86-64 these days to save ports and you think they’ll use your faulting add that takes an extra cycle? Definitely not.
Anyways, this game is going to really end up won by people higher in the stack paying the price for bounds checks and including them no matter what, because not having them is not tenable for their usecase. This drives processor manufacturers to make these checks more efficient which they have been doing for many years.
> Compilers emit lea on x86-64 these days to save ports and you think they’ll use your faulting add that takes an extra cycle? Definitely not.
"Faulting" addition should be as fast as wrapping addition and take a single instruction. Yes, I want hardware-accelerated overflow checking because it leads to more accurate results and prevents security vulnerabilities.
By the way, I want FPU operations to cause traps too (when getting infinity or NaN).
But there’s inherently more work. You need to keep track of some extra state and when the overflow actually occurs you need to unwind processor state and deliver the exception. You can make this cheap but it definitely cannot be free. From the words you’re using I feel like you have a model in your head that if you can just encode something into an instruction it’s now fast and that instructions are the way we measure how “fast” something is, but that’s not true. Modern processors can retire multiple additions per cycle. What this will probably look like is both of them are single instructions and one of them has a throughput of 4/cycle and the other one will be 3/cycle and compiler authors will pick the former every time.
With your favoured ISA style you can't just put 4 or 8 checked overflow add instructions in a row and run them all in parallel because they all write to the same condition code flag. You have to put conditional branches between them.
Or, if you want an overflowing add to trap then you can't do anything critical in the following instructions until you know whether the first one traps or not e.g. if the instructions are like "add r1,(r0)+; add r2,(r0)+; add r3,(r0)+; add r4,(r0)+". In this example you can't write back the updated r0 value until you know whether the instruction traps of not. Even worse if you reverse the operands and have a RMW instruction.
And, amusingly, the instruction count is also very competitive, especially inside loops.
Furthermore, it achieves all of that with a much simpler ISA that matches x86 and arm in features, while having an order of magnitude less instructions to implement and verify.
Compiler output is not a good way to show off the best of an ISA (which is more an indictment of how bad compilers actually are at optimising for code density). Look at the demoscene. x86 can be an order of magnitude denser than lame compiler output.
> We did not include special instruction-set support for overflow checks on integer arithmetic operations in the base instruction set, as many overflow checks can be cheaply implemented using RISC-V branches.
> For general signed addition, three additional instructions after the addition are required
Is this "cheap", replacing 1 instruction with four? According to some old mainframe era research (cannot find link now), addition is one of the most often used instructions and they suggest that we should replace each instruction with four?
Their "rationale" is not rational at all. It doesn't make sense.
Overflow check should be free (no additional instructions required), otherwise we will see the same story we have seen for last 50 years: compiler writers do not want to implement checks because they are expensive; language designers do not want to use proper arithmetic because it is expensive. And CPU designers do not want to implement traps because no language needs them. As a result, there will be errors and vulnerabilities. A vicious circle.
What also surprises me is that they added fused add-multiply instruction which can easily be replaced by 2 separate instructions, is not really needed in most applications (like a web browser), and is difficult to implement (if I am not mistaken, you need to read 3 registers instead of 2, which might require additional ports in register file only for this useless instruction).
[1] https://github.com/riscv/riscv-isa-manual/releases/download/...