Hacker News new | ask | show | jobs
by wizeman 1326 days ago
This wouldn't work for C/C++ because in these languages signed integer overflow is undefined behavior.
2 comments

That is correct. A serious mistake in C.

Go was designed by (among others) the father of Unix Ken Thompson, with an understanding of the mistakes of C and C++.

Another example is that Go requires explicit integer casts (disallowing implicit integer casts) to avoid what is now understood to be an enormous source of confusion and bugs in C.

You can understand Go as an improved C, designed for a world where parallel computing (e.g., dozens of CPU cores) is commonplace.

> A serious mistake in C.

Well, that's arguable. This "mistake" could be fixed in C tomorrow without breaking the semantics of any existing C code, but notice that this hasn't been "fixed" in any of the latest C standards, so perhaps it's still there for a reason.

And the reason it hasn't been "fixed" is that compilers can optimize code better if they can assume that signed addition won't overflow.

So it's more of a trade-off rather than strictly being a disadvantage.

It's also something you can "fix" in your own code if you want to, by passing a compiler flag (-fwrapv in gcc), although arguably, at that point your code wouldn't be strictly C-standard compliant anymore. Or by using some library that handles arithmetic overflow by wrapping around, which could be implemented in standard C.

> Another example is that Go requires explicit integer casts (disallowing implicit integer casts) to avoid what is now understood to be an enormous source of confusion and bugs in C.

I agree with you on this, although forcing explicit casts also makes the code more verbose and can make it harder to understand what's going on.

I think a balanced approach is requiring explicit casts only for the "tricky" cases, i.e. when the values might become truncated, and possibly also when sign-extension might be necessary and therefore might result in something the programmer didn't expect.

But if I were to design a language I'm not sure that I would require explicit casts for e.g. promoting a uint16_t to a uint32_t...

> You can understand Go as an improved C, designed for a world where parallel computing (e.g., dozens of CPU cores) is commonplace.

That's a bit of a hot take :) Let me know when the Linux kernel starts to get rewritten in Go ;)

> And the reason it hasn't been "fixed" is that compilers can optimize code better if they can assume that signed addition won't overflow.

Everyone says that and they have no proof.

> Everyone says that and they have no proof.

Here's an entire blog post with proof:

https://kristerw.blogspot.com/2016/02/how-undefined-signed-o...

None of that requires or even supports undefined overflow.

> Signed integer expression simplification [...]

This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.

> the index calculations are typically done using 32-bit int

No, they're done using size_t; that's what size_t is for.

> undefined overflow ensures that a[i] and a[i+1] are adjacent.

No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.

(Also, note that this does not mean the optimizer is allowed to access memory at &a[i+1], since that might be across a page boundary (or even the wraparound from address ...FFF to 0), which makes adjacency less helpful than one might hope.)

> Value range calculations [...] Loop analysis and optimization [...]

Allowed but not required to retain excess bits on signed overflow.

> > Signed integer expression simplification [...]

> This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.

> > Value range calculations [...] Loop analysis and optimization [...]

> Allowed but not required to retain excess bits on signed overflow.

What does that even mean? That the code would have different behavior with a different compiler, different optimizations or when you slightly change it (depending on whether the compiler chooses to keep wider intermediate results or not)?

If understand you correctly, then depending on the values of x and y, `(x+1)<(y+3)` would have a different result than `(x+a)<(y+b)` when a=1 and b=3, because in the first case you would be simplifying the expression but in the second case you couldn't.

That would be quite surprising, to say the least.

> No, they're done using size_t; that's what size_t is for.

  for (int i = 0; i < 256; i++)
      a[i] = b[i] + c[i];
So you've never seen code like this? Not all arrays are huge and for small indices people usually go for `int`.

Also, when doing arithmetic with indices, you might need to represent a negative index, so `size_t` wouldn't work. You'd need to use `ssize_t`, which is a signed integer which would benefit from these optimizations.

But even then you might use an `int` if you know the arithmetic result will fit.

> No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.

Not if `i` is a signed integer, say, `int` or `int8_t`. Which is the point of the optimization.

Blog post has theory but no performance measurements.
Well, then you have moved the goal post, because initially you didn't ask for performance measurements, you asked for proof that compilers can optimize code better if they can assume signed arithmetic won't overflow.

And the blog post lists dozens of optimizations which can indeed be performed due to that decision.

The benefits of these optimizations will vary depending on the actual code and on the architecture/CPU, of course.

It will be greater for hot and tight loops that benefit from these optimizations and less important for cold parts of the code.

It will also be greater for more limited CPUs and less important for more sophisticated ones.

You could write the same approach in C as `(size_t)i+(size_t)j` without UB. The real reason it doesn't work in C is because a memory region can be large enough to still overflow in that case.
That's not exactly the same approach, because you're doing unsigned addition while the Go code is doing signed addition.

And technically speaking, I think C doesn't guarantee that 'size_t' is at least as large as a 'signed int' (even though this is true on all platforms that I know of), so your approach would fail if that weren't the case. Although, you could use 'ssize_t' instead of 'int', or 'unsigned int' instead of 'size_t' to fix that.

> The real reason it doesn't work in C is because a memory region can be large enough to still overflow in that case.

The Go code we are discussing has nothing to do with memory regions, it's a generic binary search function, so it can be used for e.g. bisecting git commits. It doesn't require the calling function to use arrays.

Although yes, if the calling code were trying to do a binary search on an array, conceptually it could fail, but in that case you could argue the bug would be in the calling function, because it would be trying to pass the array length into a binary search function which only accepts an `int` or `ssize_t` function parameter, which could result in the array length being truncated. But strictly speaking, this would not be an arithmetic overflow issue.

That said, I would just fix the code so that it works for the full 'size_t' range, since the most common use case of a binary search function is indeed to do searches on arrays. In that case, the Go approach wouldn't work indeed.

> I think C doesn't guarantee that 'size_t' is at least as large as a 'signed int'

That doesn't matter, because size_t is large enough to hold any array index (that's kind of[0] the defining property of size_t), so any array index in a signed int can be safely converted to size_t. The real problem is that (using 16-bit size_t for illustrative purposes) if you have, say, x = (size_t)40000 and y = (size_t)50000 into a 60000-element array, x+y = (size_t)90000 = (size_t)24464, which means (x+y)/2 = 12232, which is the completely wrong array element.

0: Technically, size_t is large enough to hold any object size, but array elements can't be smaller than char (sizeof can't be less than 1), so a array can't have more elements than it's sizeof.

> > I think C doesn't guarantee that 'size_t' is at least as large as a 'signed int'

> That doesn't matter, because size_t is large enough to hold any array index (that's kind of[0] the defining property of size_t), so any array index in a signed int can be safely converted to size_t.

Well, the Go code we're discussing has nothing to do with arrays or array indices, so `size_t` doesn't help here.

Go look at the code :) It's a generic function for doing binary search, which accepts an `int` as a function argument, specifying the search size.

The code is then doing:

  h := int(uint(i+j) >> 1) // avoid overflow when computing h
Replacing the Go expression `uint(i+j)` with `(size_t)i+(size_t)j` in C like morelisp proposed would not work correctly if `size_t` is smaller than `int`.

That's the point I was making.

Pretty sure that’s not the case for 64 bit systems since you can “only” allocate about 48 bits of address space (maybe slightly more on newer systems).

For 32 bit systems using 64bit instead of size_t would similarly solve the problem.

Well, that's not something the C standard (or POSIX, etc) guarantees, is it?

Conceptually, a 64-bit kernel today could allow your program to allocate (almost) the entire 64-bit address space, assuming it does memory overcommit (like Linux) and/or uses some kind of memory compression (like Linux supports as well).

There might be some MMU limitations on today's mainstream systems, but this doesn't mean that all 64-bit systems have those limitations or that those limitations will remain there in the future.

So your code would break as soon as a new system comes along without those limitations.

Also, this would be even more true if the code and stack would be stored in different address spaces, as theoretically that would even allow you to allocate the entire address space, I think.

The system you describe simply doesn’t exist, standards or no. A 64-bit kernel can’t hand out 64-bits worth of addresses because no CPU built today supports it.

A 48-bit index to an array can represent >240TBytes of RAM minimum - if your records are > 1 byte, you have significantly higher storage requirements. The largest system I could find that’s ever been built was a prototype that has ~160TiB of RAM [1]. Also remember. To make the algorithm incorrect, the sum of two numbers has to exceed 64bits - that means you’d need >63-bits of byte-addressable space. That just simply isn’t happening.

Now of course you might be searching through offline storage. 2^63 bits is ~9 exabytes of an array where each element is 1 byte. Note that now we’re talking scales of about about the aggregate total storage capacity of a public hyperscaled cloud. Your binary search simply won’t even finish.

So sure. You’re technically right except you’d never find the bug on any system that your algorithm would ever run on for the foreseeable future, so does it even matter?

As an aside, at the point where you’re talking about 48-bits worth of addressable bytes you’re searching, you’re choosing a different algorithm because a single lookup is going to take on the order of hours to complete. 63-bits is going to take ~27 years iff you can sustain 20gib/s for comparing the keys (sure binary search is logarithmic but then you’re not going to be hitting 20gib/s). Remember - data doesn’t come presorted either so simply getting all that data into a linearly sorted data structure is similarly impractical.

> The system you describe simply doesn’t exist, standards or no. A 64-bit kernel can’t hand out 64-bits worth of addresses because no CPU built today supports it.

"Today" being the important part. That could change tomorrow. I could implement a 64-bit CPU right now that would support it (on an FPGA). It's not an inherent limitation, it's just an optimization that current CPUs do because we don't need to use the full 64-bit address space, usually.

Also, address space doesn't necessarily correspond 1-to-1 with how much memory there is.

For example, according to the AddressSanitizer whitepaper, it dedicates 1/8th of the virtual address space to its shadow memory. It doesn't mean that you need to have 2 exabytes of addressable storage to use AddressSanitizer, or that it reads or writes to all that space.

As I said, memory overcommit and memory compression (and also page mapping in general, as well as memory mapping storage and storage compression and storage virtualization, etc) allow you to address significantly more memory (almost infinitely more) than what you actually have.

There are other tricks with memory, page mapping and pointers which could break your code if it's not standards-compliant. This could happen for security reasons or because of new compiler or kernel optimizations or new features.

So I agree that this isn't a problem right now, unless you're doing something very esoteric, but if you want to have standards-compliant code and be more future-proof then you cannot rely on that.

There is also the point that the Go code that we're discussing has nothing to do with arrays, memory or address spaces, because it's a generic binary search function that works for any function "f" passed as an argument.

For example, it can be used to do a binary search for finding the zero of a mathematical function (i.e. for finding which value of `x` results in `y` becoming zero in the equation `y=f(x)`) and this has nothing to do with address spaces.

> I could implement a 64-bit CPU right now that would support it (on an FPGA). It's not an inherent limitation, it's just an optimization that current CPUs do because we don't need to use the full 64-bit address space, usually.

You’re hand waving away way too much complexity. Please do build this system. Keep in mind that addressing 63bits of memory with huge tables on will use up > 2 Tera worth of PTEs which translate to what, 16 Terabit worth of memory? This is simply an order of magnitude more than dedicated machines ship with. You’re certainly not getting an FPGA with that.

> For example, according to the AddressSanitizer whitepaper, it dedicates 1/8th of the virtual address space to its shadow memory. It doesn't mean that you need to have 2 exabytes of addressable storage to use AddressSanitizer, or that it reads or writes to all that space.

I think you’re failing to appreciate how large 2^63 bytes is.

> As I said, memory overcommit and memory compression (and also page mapping in general, as well as memory mapping storage and storage compression and storage virtualization, etc) allow you to address significantly more memory (almost infinitely more) than what you actually have.

See point above. Such a system is just not likely to exist in your lifetime.

> but if you want to have standards-compliant code and be more future-proof then you cannot rely on that.

All code has a shelf life. What’s the date you’re working on here? I’m willing to bet it’s not an issue by the end of this century.

> For 32 bit systems using 64bit instead of size_t would similarly solve the problem.

~~C does not guarantee a 64 bit type exists.~~ This is not really correct these days.

> C does not guarantee a 64 bit type exists.

Isn't (signed/unsigned) 'long long int' mandatory since C99?

It says: 'There are five standard signed integer types, designated as signed char, short int, int, long int, and long long int'.

My cursory search for 'long long' in the standard didn't find anything about it being optional...

You appear to be correct, though in my defense I didn't give a version and I have definitely been stuck on such a compiler long after 1999. (And I suspect they're still over-represented for 32 bit systems.)