Hacker News new | ask | show | jobs
by ChancyChance 1172 days ago
Is the world finally realizing that "a + b" actually returns two values: pass/fail and the value if pass?

"a + b = c;" is a fundamentally flawed operation from a computer architecture perspective.

3 comments

First, you might have meant c = a+b;

The other way isn't really definable as an assignment mathematically.

And there is a lot more to it than just pass/fail. First, an addition doesn't fail, from a computer architecture perspective, the addition will always succeed, the only thing that could fail (in all the usual architectures) are possible memory fetch and store operations when not strictly dealing in register or immediate operands. Second, there is no fail flag. There is a overflow flag, an underflow flag, a zero flag, a sign and a few more that are irrelevant here. Any of overflow, underflow, zero or sign might mean that the operation "failed" depending on the types of your operand. Where the processor doesn't know anything about the type, so there won't be a straightforward 'fail' flag in any case. Only the library or compiler can use type information such as (un)signedness, bignum-ness, nonzeroness, desired wraparound (for modular types) and other possible types together with aforementioned flags to decide if that addition might have failed.

So nothing is fundamentally flawed, what you are describing is just insufficiently complex (because there is no fail flag, just a ton of other flags) or overly complex (because uint32_t c = a + b is modular 2^32 arithmetics and cannot fail).

> First, you might have meant c = a+b;

> The other way isn't really definable as an assignment mathematically.

This correction is condescending and unnecessary. Unless the person had never written a single line of code in their life, then they would obviously know "a+b" is not a modifiable lvalue.

And the point about pass/fail was also obviously not mean to capture the full complexity of the flags set by a CPU operation. It was very clearly a statement about how basic addition does not behave in computers the way it does on paper -- as simple as that.

From HN guidelines: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."

You might be right on the first point. Edit: actually, you might not be. There are languages with compound lvalues and CPU architectures with multiple result registers (x86 being the best-known example). E.g. you can do "(result, flags, err) = do_stuff(a, b, c)" in Go, and x86 DIV storing different parts of the division result in different registers: https://c9x.me/x86/html/file_module_x86_id_72.html And generally with common CPU architectures, flags are another such result register that is always written, such that any operation like c := a+b is actually something like (c, flags) := a+b. And for stuff like multiplication, there is actually the notion of two result registers being the higher and lower part of the resulting operation, like (a * 2^32 + b) = c * d (see x86 MUL). Therefore some precision in language is necessary for the discussion (and yes, the different meanings of ==, =, := in various languages and mathematics are also confusing, even to me ;).

I do strongly disagree on the second one about pass/fail. This kind of nitpicking is necessary here, because the discussion is about a standard intended to precisely describe such operations, and how the underlying hardware might be utilized to execute them. Being imprecise in this context is dangerous, wrong, problematic and leads to the whole point of the discussion being lost in a sea of handwaving.

> The other way isn't really definable as an assignment mathematically.

It's an equality sign. See also, := and unification.

There is actually another option.

A more sophisticated type system.

Let's say you had some pseudocode like this:

    let a = 5
    let b = 12
    let c = a + b
The type of a would be Integer[5..5], the type of b would be Integer[12..12], the type of c would therefore be Integer[17..17]. In a more complex example:

    def foo(a: Integer[0..10], b: Integer[0..10]):
        return a + b
The return type of this function would be Integer[0..20].

This kind of type system can solve a number of issues, all but division by zero (which would probably still have to be solved with some kind of optional type).

If type inference dictates that the upper range of an integer would be too large to physically store in a machine data type, then you either resort to bignums or you make it a compilation error. By adding modular and saturating integer types you can handle situations where you want special integer behaviours. By explicitly casting (with the operation returning an optional) you can handle situations where you want to bound the range. This drastically simplifies a lot of code by removing explicit bounds checks in all places except where they are absolutely necessary. If for some reason you care about the space or computational efficiency of the underlying machine type, you can have additional annotations (like C's u?int_(least|fast)[0-9]+_t). If you absolutely must map to a machine type (this is usually misguided, unless you are dealing with existing C interfaces, for which such a language can provide special types) you can have more annotations.

Ada has something resembling this. I believe there are some other languages that implement similar features. I believe this sort of thing has a name, but I am not great with remembering the names of things.

Hopefully this is some food for thought.

I think the issue with this is that the worst-case bounds normally grow much faster than the actual values. And it can be easy to see for the programmer that the values can't actually grow that much because a is only big when b is small or some property like that, but then you have to convince the compiler of the same. I might be misremembering though.
> because a is only big when b is small or some property like that

Exactly, the expressiveness of the type system then (typically) becomes the obstacle: How do you express that a and b could each reach INT_MAX but their sum never exceeds INT_MAX?

Those kinds of assumptions are where you explicitly cast to a smaller ranged type with the option of an error if the sum does exceed a limit. The point of this type system is not to be able to fully encode every possible interaction between numbers in a system, but rather to remove unnecessary bounds checking in a bunch of cases and make it explicit in the few cases where you ARE actually making an assumption.
> Those kinds of assumptions are where you explicitly cast to a smaller ranged type

But how exactly do you do that? As mentioned, a and b individually can still reach INT_MAX.

I agree with your overall assessment, though. If a type system could represent (and recognize, and evaluate / automatically draw conclusions from) any possible restriction on the values of variables this would probably amount to the type system being able to carry out arbitrary mathematical proofs. The existence of such a type system seems rather unlikely.

In some pretend rust. Pretend Integer exists and has the properties I described:

    fn special_sum(a: Integer<0..(1<<32)>, b: Integer<0..(1<<32)>) -> Option<Integer<0..(1<<32)>>
    {
        Integer<0..(1<<32)>::try_from(a + b)
    }
The type of the expression a+b would be Integer<0..(1<<33)-1>.

Obviously you can design a language which makes this much more ergonomic. That language can also avoid needing to use a type larger than a 32 bit wide unsigned integer to perform the operation through a special optimisation. Moreover, there's nothing stopping the type system from being able to maintain more sophisticated rules, for example there's nothing stopping a product type of two Integer<0..(1<<32)> having a rule applied which ensures that the sum of the two values cannot exceed 2*32-1.

> but then you have to convince the compiler of the same.

In conventional parlance, this is known as "handling overflow".

> I believe this sort of thing has a name, […]

https://en.wikipedia.org/wiki/Refinement_type

But the concept is just a little bit over 30 years old. So don't expect it shows up in most mainstream languages before the end of the next 20 years, and don't expect it to come to the C languages ever.

Meanwhile in mainstream ML-land:

https://github.com/Iltotore/iron

(Or for the older version of the language: https://github.com/fthomas/refined)

(Please also note that for this feature both versions don't need language support at all but are "just" libraries, as the language is powerful enough to express all kinds of type level / compile time computations in general.)

I know C is never getting anything like this. But C is really just stuck being a very crappy ABI design language.

I do wish things like Rust had native support for stuff like this though.

And it really doesn't have to get in the way of anyone who insists on the more primitive type systems, but it would be nice to have a high level language which didn't have assembly-level (of abstraction) integer types. Why should I care that my machine works with square multiples of 8 bits at a time or care that they have specific wrapping behaviour.

> But the concept is just a little bit over 30 years old. So don't expect it shows up in most mainstream languages before the end of the next 20 years, and don't expect it to come to the C languages ever.

Any specific results/papers from (refinement) type theory you hope/expect to see implemented in the next 20 years?

Compilers do this sort of range tracking anyway. At least within a function. It's useful for loop optimisations.
It's a flaw that has a pretty good tradeoff: unparalleled readability.
It depends. If you want to study maths, yes. If you want to be a programmer:

[status, value] = add(a, b);

Is much more unparalleled-ly (?) readable from the perspective of how a computer actually operates. In reality, this:

uint c = (uint)a + (uint)b; // (to make that other guy happy)

is really:

c = (a + b) % (sizeof(uint));

in "C", which is less readable but far more accurate.

That’s 2^sizeof(uint)