Hacker News new | ask | show | jobs
by _a_a_a_ 868 days ago

  >  rd = add rs1, rs2
I dunno, how about

  rd = rs1 + rs2
1 comments

Because from the CPU perspective, «+» is ambiguous as there is not one «addition» but many:

- signed add

- unsigned add

- add and carry

There are a few others in other ISA', then there are the operand sizes (byte, half-word, word, long word etc) and the «+» operator does not capture the operand size nor the specifics whereas

- rd = rs1 addu8 rs2

makes the intention clear: «add the lower 8 bits from rs2 to rs1, don't set the sign bit when overflown and store the product in rd».

Moreover, the «+» operation is commutative and the CPU instructions are not, e.g.

- rd = rs + #10

and

- rd = #10 + rs

mean two completely different things for the CPU and the latter does not even have an encoding for it. The assembly processor is not the right place to place the smarts in to figure out the programmer's intention, either, as it is a very straightforward 1:1 assembly syntax to the ISA encoding translator.

Good post! ok, for

  rd = rs1 addu8 rs2
use

  rd = rs1 +u8 rs2
etc. The + stands out, more clearly indicating addition. (to me anyway)

As for commutativity of +, it's not necessarily true. It depends entirely on what underlying operation + denotes. It's perfectly reasonable to use it for string concatenation, and that clearly isn't commutative. But if it's not in the case of an ISA, that's fine, just have the assembler reject it.

> rd = rs1 +u8 rs2

AMD 29k (its descendants are still alive) has two further ADD operations:

– ADDU – IF unsigned overflow THEN trap (out of range), and

– ADDCS – IF signed overflow THEN trap (out of range).

The ADD instruction family in the HP PA-RISC 2.0 is, of course, one of the best ones out there:

  ADD,cmplt,carry,cond r1,r2,t
Purpose: To do 64-bit integer addition and conditionally nullify the following instruction.

General register r1 and general register r2 are added. If no trap occurs, the result is placed in general register 1. The variable «carry_borrows» in the operation section captures the 4-bit carries resulting from the add operation. The completer, «complt», specifies whether the carry/ borrow bits in the PSW (the processor status word) are updated and whether a trap is taken on signed overflow. The completer, «carry», specifies whether the addition is done with carry in.

So, under a certain set of conditions, a PA-RISC «add» operation, other than yielding an add product, can:

– Can result in a trap on a signed overflow.

– Can nullify the following instruction.

An instruction mnemonic would like this:

  ADD,DC,TSV,C,<=,N r1, r2 – «add where r1 is less than or equal to negative of r2, and trap if specified conditions (DC,TSV) were met and nullify the following instruction»
HP PA-RISC 2.0 also has an «add and branch» instruction, naturally, embellished with branch conditions, «cond»:

  ADDB,cond,n r1, r2, target
If «n» flag is set, the «add and branch» will also nullify the following instruction, e.g.

  ADDB,*<=,N r1, r1, label_1
There are also an «add immediate left» instruction (a left shift and add a constant) and «halfword parallel add» (adds multiple halfwords in parallel with optional saturation).

How does one encode all of that with a «+» operator?

What about adding two decimals? The decimals are not used in modern CPU architectures, but they used to be a commonplace and were encoded by completely separate instructions.

The bottom line is: the assembly language is a slightly more human friendly (e.g. «add r1, r2, r3») interface into the bit code (e.g. a fictional opcode of «0xf500010203» for «add r1, r2, r3») and, consequently, into the internal CPU machinery and its state, and is not a high programming language nor a testamenet to the laws of mathematics.

I appreciate the detail you've given and you clearly know your stuff, but I think your looking at it too deeply, at least compared to me. Basically, use a familiar notation where a familiar notation would be appropriate. Your complex ad example is a good case where it probably isn't.

> What about adding two decimals?

  rx = ry +bcd rz
I'm really thinking of simple, simple changes. As you point out some situations aren't appropriate for it, in other cases maybe it is. Or maybe I'm just plain wrong and it never is appropriate. But it's just a suggestion and worth considering, no?
> […] where a familiar notation would be appropriate.

And this is the problem I have been trying to point out. «+» comes from math, and – in the world of math – there are no bytes, no half-words, no overflows, no carry overs, no branches and no traps, and the imaginary «register» size is infinite – an addition always succeeds however large is the number. There are no decimals, no floating point numbers, either – in math, there is no distinction. There is just a lone exception being the handling of the explicit infinity values (-∞ and +∞). «Add and branch» or «add and trap» simply do not exist in math – math is abstract, and computing is concrete.

That is not the case in computing. The semantics of «add» varies across different CPU architectures, as the assembly language exposes the internal machinery and the internal state of a given CPU which may or may not be appropriate for another CPU. For example, RISC-V does not support the integer overflow flag, and most other CPU's do. Assembly is a 1:1 representation of the binary code for a given CPU, and that is where it stops.

In fact, I do find the charm in your proposal being along the lines of «r3 = r1 +.u8 r2», «r3 = r1 +.c r2 or branch label1», «r3 = r1 +.c r2 or trap overflow» etc. It will not be assembly though, more of a meta-assembly, which is fine. The real trouble is that the grammar will be able to handle just one ISA and might quickly become complex and unwieldy if ported to another ISA that has a rich set of addition operations that do not fit in the narrow constraints of such a design.

I'm glad you like the possibility of using the addition symbol or some variation on it. The idea that it would be ISA independent never crossed my mind, and perhaps just as well because as you pointed out, it would bomb. For that (ie. portability), use an HLL.

I fully understand your criticism, entirely correct, that machine arithmetic does not behave like mathematical arithmetic. I diverged from you where you say that '+' is mathematical therefore has a predetermined meaning. It has a conventional mathematical interpretation, often related to number addition, but it is a convention only and can be bent as far as you like provided you're clear about it (and a bit of common sense too; defining '+' as a square root operation would be pretty bloody stupid). No symbol in mathematics has any intrinsic meaning, including '+'.

Thanks for interesting discussion!

Obviously that doesn't generalize for opcodes with 0, 1, 3 or more input operands. As a concrete example, consider a fused multiply-add operation: `rd = fma rs1, rs2, rs3` is consistent, but how would you convert it to an infix notation?

  rd = rs1 * rs2 + rs3
It's no different from parsing if/then/else. Provided one assignment maps to one opcode, it's unambiguous.
pre-fix is the finest notation for many reasons. So, Intel syntax makes more sense than ATT syntax.

Even C functions are pre-fix. [function_name arg1, arg2 ...] If schools were teaching pre-fix, we would be in better position to appreciate Intel syntax.

Higher maths uses pre-fix, for example: y=f(x) unfortenately, above line is not pure prefix... = y f(x) would be pure pre-fix.

  x++
is a postfix kind-of function

  a ? b : c
is ternary infix

Edit: arguably

  /* comment! */
is a function that takes an unquoted string. It's a no-op function so the compiler removes it at compile time (giggle)