Hacker News new | ask | show | jobs
by ncruces 52 days ago
It's not really signed vs unsigned that's the issue, IMO. It's (mostly, in C) undefined behavior and implicit conversions?

I'm not sure Go is saner just because len is an int. Well, maybe, depending on how you look at it. Defining len to be signed int, means the largest valid len is half your address space, which also means half of all possible indexes are always invalid; which makes some things easier.

But it's really that integer arithmetic is not undefined behavior regardless of signedness, that bounds are checked, and that even indexing your slice with an int64 on a 32-bit CPU does the full correct bounds check. In fact, you can use any integer type as an index.

Given all of the above, indexing with a uint or an int is actually indiferent. In that case, the bound check is a single unsigned <len compare (despite the fact that len is signed).

What's really painful, is trying to handle a full 32-bit address space with 32-bit addresses and sizes, like in Wasm; you need 33-bit math. So in a sense, limiting sizes to 31-bit (signed) does help. But at the language level, IMO, the rest matters more.

1 comments

For signed overflow we have sanitizers, and for conversions C compilers warnings in C. Bounds checking can also be done with sanitizers (but is a bit more tricky). So no, I do not think the undefined behavior is really a big problem. In fact, it helps us find the problem because every overflow can be considered a programming error.

Error due to unsigned wraparound are a much bigger issue, because the lead to subtle issues where neither automatic warnings nor sanitizers help, exactly because it is well-defined and no automatic tool can tell whether the behavior is intended or wrong.

> Error due to unsigned wraparound are a much bigger issue

This is a type design mistake. The unsigned integers should not wrap by default. It makes absolute sense, given all the constraints and the fact that it's doing New Jersey "implementation simplicity dominates" design that K&R C only provides a wrapping unsigned type, but that's an excuse for K&R C which is a 1960s programming language.

The excuse gets shakier and shakier the further you move past that. C3 even named these types differently, so they're certainly under no obligation to provide the wrapping unsigned integers as if that's just magically what you mean. In most cases it's not what you mean. The excuse given in the article is way too thin.

Rust's Wrapping<u32> is the same thing as the wrapping 32-bit unsigned integer in C or C++ today, but most people don't use it because they do not actually want the wrapping 32-bit unsigned integer. This is a "spelling matters" ergonomics class again like the choice to name the brutally fast but unstable general comparison sort [T]::sort_unstable whereas both C and C++ leave the noob who didn't know about sort stability to find out for themselves because they name this just "sort" and you get to keep both halves when you break things...

Unsigned is certainly a misnomer for a wrapping type. That does not mean it is a type design mistake. And I agree that people should not use it much.

But what I do not believe is that there is a real need for a non-wrapping non-negative integer type.

> But what I do not believe is that there is a real need for a non-wrapping non-negative integer type.

So the most obvious counter example is so obvious you might not even have remembered it's a type, the unsigned 8-bit integer or byte.

But frankly if you don't have the wrapping mistake they just make for a pretty good general purpose index, they're a useful counter, there's a reason we called these the "Natural numbers".

I am not convinced. A byte is for low-level accessing of memory, you shouldn't really do any computation with it, except maybe low-level bit-fiddling or crypto, but then the non-wrapping non-negative inter is not correct either.

Natural numbers are nice, but then we invented zero and negative number so we got a group structure for addition which is really useful. Because even for a counter, or some index, you may want to to addition and subtraction and then you definitely do not want a non-wrapping non-negative integer for intermediate results.

And the rust design with unsigned type where subtraction does not return a signed type but may fail at return or silently produce the wrong results, seems the worst possible design imaginable to me.

Was this last part added or did I just miss it? Huh.

> And the rust design with unsigned type where subtraction does not return a signed type but may fail at return or silently produce the wrong results, seems the worst possible design imaginable to me.

You can ask for whatever you meant, and indeed asking for what you meant is crucial here because if we express ourselves we get the desired results.

For example u8::borrowing_sub lets us do the arithmetic style you may have learned in primary school in which we track whether we "borrowed" one because of our subtractions, this might be useful in some places and is certainly easier to understand.

u8::checked_sub tells us either the answer or that it would overflow, which might allow us to take a different course of action and not need the subtraction.

u8::saturating_sub performs saturating arithmetic, if it would overflow we get the largest value in the appropriate direction instead, this often makes sense in e.g. signal processing.

u8::unchecked_sub promises we know the subtraction doesn't overflow and so no checks are needed, this is a performance optimisation if you really need it.

u8::wrapping_sub_signed performs the wrapping arithmetic you say is sometimes a good idea, with specifically a signed i8 parameter rather than an unsigned one if we want that.

The truth here is that you might want a lot of different operations and the C choice is not only to provide a single choice, which made a lot more sense 50+ years ago than it does today, but to provide a singularly bad default.

The wrapping APIs do come up a lot in cryptography, but in bit twiddling I think they're as often a hindrance because we actually want to be pulled up short if we're trying to squeeze things where they won't fit.

It have definitely written C code which tries to use 257 values for a byte, with zero playing both its role as "just zero" in some places, and then also serving as 256 because "it's never zero" in other places and of course this is a nasty bug if one of those "it's never zero" zeroes gets into the "it's just zero of course" code paths or vice versa.

The "Wrapping will fix my arithmetic ordering" thing is in this article too and I think that's also a terrible idea, maybe even worse than the wrapping unsigned integer types themselves because it leads to a muddled idea of what's really going on.

> The unsigned integers should not wrap by default.

What would you do instead?

How about just panic? If a wrap happens and you don't expect it, it's almost always a severe bug.

Then, dedicated APIs for wrapping behavior where you expect it to happen.

Because it adds 4-6x overhead to all integer arithmetic
Do you always run with those sanitizers in place?

Just this week I've had a C compilers silently delete me an entire function call because of UB (infinite loop without side effects). Took me a day to figure out. So that's a problem for me.

I don't think I've ever had an hard to debug issue in Go because of signed/unsigned wrap around. Particularly a memory issue.

If anything, and there I guess I agree with the article, I wish Go had implicit conversions to wider types: to make the problematic ones stand out.

I guess the reason it doesn't is that they're different named types, which would be a problem when you create a named type for the purpose of forcing explicit type conversions. But maybe the default ones could implicitly implement a numeric tower, where exact conversions can be implicit.

That depends. But some sanitizer are cheap enough that you can usually always run them.

Regarding infinite loops, C++ and C differ with C++ being more aggressive. But also compilers differ with clang being more aggressive. https://godbolt.org/z/Moe6zYKqo

In general, I do not recommend to use clang if you worry about UB. gcc is a bit more reasonable and also has better warnings.

> In fact, it helps us find the problem because every overflow can be considered a programming error.

High performance, lock-free FIFOs/channels are commonly implemented in a way that requires overflow.