Hacker News new | ask | show | jobs
by geokon 1636 days ago
"runtime checks on array/slice access and integer under/overflow"

I'm probably missing something. I feel like you'd get this and a lot of the other benefits you list if you just compile C/C++ with Debug options - or run with Valgrind or something. Are you saying you get automatic checks that can't be disabled in Zig? (that doesn't sound like a good thing.. hence I feel I'm missing something :) )

6 comments

You're correct: you do get virtually all of the safety benefits of Zig by using sanitizers in C++. (Not speaking to language features in general, obviously.) In fact, C++ with sanitizers gives you more safety, because ASan/TSan/MSan have a lot of features for detecting UB.

Especially note HWASan, which is a version of ASan that is designed to run in production: https://source.android.com/devices/tech/debug/hwasan

The runtime safety checks are enabled in Debug and ReleaseSafe modes, but disabled in ReleaseFast and ReleaseSmall modes. They can be enabled (or disabled) on a per-scope basis using the `@setRuntimeSafety` builtin.
What "Debug options" are you imagining will provide runtime checks for overflow and underflow in C and C++ - languages where this behaviour is deliberately allowed as an optimisation?

In C it's simply a fact that incrementing the unsigned 8-bit integer 255 gets you 0 even though this defies what your arithmetic teacher taught you about the number line it's just how C works, so a "Debug Option" that says no, now that's an error isn't so much a "Debug Option" as a different programming language.

> What "Debug options" are you imagining will provide runtime checks for overflow and underflow in C and C++ - languages where this behaviour is deliberately allowed as an optimisation?

-fsanitize=undefined.

> In C it's simply a fact that incrementing the unsigned 8-bit integer 255 gets you 0 even though this defies what your arithmetic teacher taught you about the number line it's just how C works, so a "Debug Option" that says no, now that's an error isn't so much a "Debug Option" as a different programming language.

Yes, but this happens to be defined behavior, even if it’s what you don’t want most of the time. (Amusingly, a lot of so-called “safe” languages adopt this behavior in their release builds, and sometimes even their debug builds. You’re not getting direct memory corruption out of it, sure, but it’s a great way to create bugs.)

That’s a distinction without a difference. Yes it’s defined behavior. No, there isn’t a strictness check in C++ nor a debug option that will catch it if it causes a buffer overwrite or similar bug. Your comment is basically “no need to watch out for these bugs, they are caused by a feature”.
Did you read the same comment that I wrote? The very first thing I mentioned is a flag to turn on checking for this. And I mentioned the behavior for unsigned arithmetic is defined, but then I immediately mentioned that this behavior is probably not what you want and that other languages are adopting it is kind of sad.
People read the comment that you wrote, in which you, in typical "real programmer" fashion redefined the question so that it matched your preferred answer, by mentioning a flag that does not in fact, check for overflow and then clarifying that you've decided to check for undefined behaviour not for overflow.

[ saagarjha has since explained that in fact the UBSan does sanitize unsigned integer overflow (and several other things that aren't Undefined Behaviour) so this was wrong, left here for posterity ]

Machines are fine with the behaviour being whatever it is. But humans aren't and so the distant ancestor post says they liked the fact Zig has overflow checks in debug builds. So does Rust.

If you'd prefer to reject overflow entirely, it's prohibited in WUFFS. WUFFS doesn't need any runtime checks, since it is making all these decisions at compile time, but unlike Zig or indeed C it is not a general purpose language.

I would personally prefer a stronger invariant–overflows checked in release builds as well. Compile time checks are nice in the scenarios where you can make them work, of course, but not feasible for many applications.
> -fsanitize=undefined.

As you yourself almost immediately mention, that's not checking for overflow.

Was the goal here to show that C and C++ programmers don't understand what overflow is?

> Yes, but this happens to be defined behavior, even if it’s what you don’t want most of the time

The defined behaviour is an overflow. Correct. So, checking for undefined behaviour does not check for overflow. See how that works?

Sorry, perhaps I assumed a bit too much with my response. Are you familiar with -fsanitize=unsigned-integer-overflow? Your response makes me think you might not be aware of it and I wanted you to be on the same footing in this discussion.
I was not. So, UBSan also "sanitizes" defined but undesirable behaviour from the language under the label "undefined". Great nomenclature there.

It also, by the looks of things, does not provide a way to say you want wrapping if that's what you did intend, you can only disable the sanitizer for the component that gets false positives. I don't know whether Zig has this, but Rust does (e.g. functions like wrapping_add() which of course inline to a single CPU instruction, and the Wrapping<> generic that implies all operations on that type are wrapping)

But you are then correct that this catches such overflows. Thanks for pointing to -fsanitize=unsigned-integer-overflow.

Since we're on the topic of sanitizers. These are great for AoC where I always run my real input under Debug anyway, but not much use in real systems where of course the edge case will inevitably happen in the uninstrumented production system and not in your unit tests...

> It also, by the looks of things, does not provide a way to say you want wrapping if that's what you did intend

This would be something for C/C++ to add, which they (for reasons unknown to me) failed to make progress on. I applaud Rust for having them; they're table stakes at this point.

> Since we're on the topic of sanitizers. These are great for AoC where I always run my real input under Debug anyway, but not much use in real systems where of course the edge case will inevitably happen in the uninstrumented production system and not in your unit tests...

Right, they are not perfect. They're a bandaid; a valiant effort but even then not a particularly great bandaid. As I've described elsewhere, I don't actually think this situation is going to get any better :(

Runtime checks for signed overflow can be enabled with -ftrapv in GCC and clang. Having this option open is why some people prefer to use signed integers over unsigned.
C unsigned integers are completely well behaved: they do arithmetic modulo 2^n, and I hope you had a teacher that exposed you to that. C has many problems but that isn't one of them: overflow of unsigned is designed and documented to wrap around.
> C unsigned integers are completely well behaved: they do arithmetic modulo 2^n

Sadly, one rarely finds an excuse to work in the field Z_(2^32) or Z_(2^64), so while that behavior is well-defined, it's rarely correct for whatever your purpose is.

It is usually correct for my purposes (electronic design automation). When it isn't I need to figure out how to handle overflow. There is no automatic solution that is right in all cases, and a trap certainly isn't.
Array indices should arguably be unsigned (and struct/type sizes), so I'd say it's a lot more common than you imply.
I would have used to argue this, until I learned that Ada not only allows enum-indexing into arrays (compiler handled), but it also allows non-zero-based indexing.

Example: #1

    -- You just index into this using 100 .. 200 and let the compiler handle it.
    type Offsetted_Array is array (Positive range 100 .. 200) of Integer;
Example: #2

    -- Indexing using an enumeration (it's really just a statically sized map)

    -- An enumeration.
    type c_lflag_t is (ISIG, ICANON, XCase, ... etc.

    -- Create an array which maps into a single 32-bit integer.
    type Local_Flags is array (c_lflag_t) of Boolean
        with Pack, Size => 32;
Yes, Ada is pretty flexible in this regard, but I'm not sure how useful this actually is.
And exactly how is silent wraparound useful or even sane for that use case? You just proved the point of the one you responded to.
Wrapping is more sensible than negative indices.
It’s useful when working with bits and bytes and stuff. Aside from that, I fully agree.
I think the programmer should be able to specify what happens on overflow.

Maybe they're bit twiddling and silent wrapping is expected. Maybe they want the program to hard fault. Both are valid.

Perhaps you'd like Rust, where all the choices are offered, as functions on integers such as:

carrying_add (separate carry flag on input and output)

checked_add (result is None if it would overflow)

unchecked_add (explicitly unsafe, assumes overflow will never occur)

overflowing_add (like carrying_add but does not provide carry flag input)

saturating_add (the integer "saturates" at its maximum or, in the opposite direction, minimum - useful for low-level audio code)

wrapping_add (what C does for unsigned integers)

Rust also has variants that handle potentially confusing interactions e.g. "I have a signed integer, and I want to add this unsigned integer to it". With 8-bit integers, adding 200 to -100 should be 100, and Rust's provided function does exactly what you expected, whereas in C you might end up casting the unsigned integer to signed and maybe it works or maybe it doesn't. Likewise for "What's the magnitude of the difference between these two unsigned integers?" Rust provides a function that gets this right, without needing to consult a textbook for the correct way to tell the compiler what you want.

If you can't afford to ever get it wrong, WUFFS simply forbids overflow (and underflow) entirely, WUFFS programs that could overflow aren't valid WUFFS programs and won't compile.

Right, but in almost all languages one of the possible options is chosen by default because people want "+" to do something instead of having to specify each time. My personal opinion is that "+" should trap by default and the various other behaviors that are available (which 'tialaramex lists below as examples of which Rust provides) via some other mechanism. Some languages (C, C++) do it yet another wrong way in that "+" does a thing and there is no other way to do addition, and it's even worse because they picked one of the bad ones to use as a default.
-fsanitize=address,undefined,etc

There's even threadsanitizer which will tell you about deadlocks and unjoined threads.

Defaults matter a lot. Just because something is possible doesnt mean it is likely to happen.

Are most people going to enable asan, run their programs through valgrind extensively, or just do the easy thing and not do any of that?

This is also why neovim is being actively developed and successful and vim is slowly decaying. The path of least resistance is the path most well travelled.

Any project with a decent test coverage and CI can easily set up an ASAN / Valgrind run for their tests. I know I've had this on the last few C++ codebases I've worked with.
I would say that keeping the checks in runtime for release builds is the smart default. For most usages, removing the checks in release builds only adds security holes without measurable impact on performance.
Slices allow catching a lot of bounds errors that you can't reliably catch when using raw pointers.