Hacker News new | ask | show | jobs
by MaxBarraclough 2016 days ago
C is unusual though in that it's a minefield of undefined behaviour. It's very easy to think you truly understand the language but to have no real understanding of its many curious rules around undefined behaviour. You can't take a try it and see attitude, you need to be a language-lawyer.

Of course, even if you understand the rules you'll still accidentally write code that invokes undefined behaviour, which is much of the reason languages like Zig exist.

4 comments

Zig also has undefined behavior.

See whole section here

https://ziglang.org/documentation/master/#Undefined-Behavior

the difference is, they try to check for them at compile time, and if you compile with safety checks, also at runtime; however, if you compile with ReleaseFast (which I assume most people will in production), those runtime checks are turned off and the undefined behavior still exists.

> (which I assume most people will in production)

No, people are supposed to use ReleaseSafe, especially "in production". You can still disable runtime checks in specific execution paths.

ReleaseFast is for applications where there are no disastrous consequences to a bug in the code, like videogames.

Having your account hacked is a disastrous consequence though. And it's likely that the need of performance is not on the CPU side of things but on the GPU so ReleaseSafe should be good enough.
Oh! I didn't know that. In that case, sorry for mischaracterization.
Right, I hadn't meant to imply Zig is free of UB. It aims to improve on C's wild-west UB rules not by having no UB, but by having only a manageable dose of it, and supporting good optional runtime checks.

Zig's approach is essentially that of Ada. You can ask the Ada compiler for runtime checks, or promise the compiler that your code is free of undefined behaviour and have your code run at the speed of light (C), or go haywire if you got it wrong. Sadly C is less suited to runtime checks, arrays are dealt with through raw pointers so range checks aren't easy for the compiler to add automatically. You can though ask GCC to generate checks (trap-on-failure) for things like dereferencing NULL, or signed arithmetic overflow.

How does C programmers handle integer overflow?

You can’t really check if an integer variable is greater than 2^32, since that itself would cause an integer overflow.

The most convenient way is to probably to use a compiler builtin https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins..., if you want to be portable the next easiest way is to use a wide enough type (e.g. add or multiply two 32 bit numbers to a 64 bit one and verify it is inside [INT_MIN, INT_MAX]). Otherwise, you can either do a pre-condition check (for addition overflow occurs if a > 0 && b > 0 && a > INT_MAX-b || a < 0 && b < 0 && a < INT_MIN-b) or work with unsigned integers and check for wraparound after the operation. Finally, both clang and gcc have options to check for signed integer overflow at runtime (-fsanitize=signed-integer-overflow for gcc).

Of course, in practice this is too much effort for most people most of the time, so actual deployed C and C++ code is full of undefined behavior due to integer overflow. This paper has a great overview:

https://wdtz.org/files/tosem15.pdf

The overflow happens in the computation, you don't typically determine whether there was an overflow by checking the variable into which the result is saved (although perhaps this approach could work if you're using an unsigned integer type, which wraps around on overflow). You'd normally perform the check before you perform the addition (or whichever operation risks overflow). This can be fiddly to get right, but it's possible.

Also, the maximum value that can be represented in a variable of type uint32_t is (2^32) - 1, not 2^32.

Can you be more specific? Since an integer cannot be greater than 2^32 checking for this doesnt make sense.
C is a great language for pub quizzes, because so many think mastering K&R C book is all it takes.
Here are your pub-quiz questions: https://www.gimpel.com/archive/bugs.htm
I remember when Gimpel used to do magazine ads with those kind of questions.

Most DDJ or The C/C++ Users Journal issues will have them.

The amount of language-lawyer UB bugs vs all other bugs for me is like zero, when I write C. I can't actually recall any language-lawyer bugs.
Chromium, OpenSSL, the Linux kernel, the Windows NT kernel, have all suffered from security vulnerabilities due to undefined behaviour. We can bet they will continue to suffer from such issues. It's not something you can avoid simply by being competent and careful.

edit: As lmm says, it's likely you have UB issues in your code you aren't aware of. That's not quite the same thing as having issues in your code due to not being a good enough language-lawyer. I've resolved some very subtle issues that found their ways into a 'serious' C++ codebase, and I didn't spend that long in the C++ world. In most languages those issues simply couldn't have happened in the first place.

Sure but is it not mainly a case of that the more bugchecked and field tested the code is, the more obscure any bug that surface is?

GCC compiles to alot of architectures. I have a hard time imagining any modern language compiling to all those platforms without quirks in practice.

> the more bugchecked and field tested the code is, the more obscure any bug that surface is?

Right. A battle-tested codebase only has subtle errors, as the obvious ones will all have been fixed. An immature codebase has subtle errors and more obvious ones.

> GCC compiles to alot of architectures. I have a hard time imagining any modern language compiling to all those platforms without quirks in practice.

Compiler bugs are a separate issue from undefined behaviour and surprising language subtleties. With mature compilers they're pretty rare, but they do happen.

JavaScript is a good example. There's no undefined behaviour in JavaScript. That's vitally important given that JavaScript engines have to be able to run untrusted code. If JavaScript code is able to cause undefined behaviour, that's a serious security issue in the engine. Such bugs do happen, of course, but they aren't all that common. Generally, JavaScript runs fine regardless of whether you're running on x86, AMD64, or AArch64. Same goes for Java.

(I admit I'm ignoring the possibility of a constrained/contained kind of undefined behaviour where the JavaScript context might see things go haywire but the process containing the JavaScript environment is unaffected.)

How do you know? One of the reasons they're so insidious is that code that hits them tends to work fine until it gets compiled with a newer version of the compiler. E.g. signed integer overflow did exactly what you expect in most compilers until fairly recently.
> E.g. signed integer overflow did exactly what you expect in most compilers until fairly recently.

How recently? Both gcc 4.1.2 (2007) and clang 3.0.0 (2011) optimizes `x+1 > x` to true for a signed int `x` on -O1. And it probably goes way back, these are just the oldest compilers I found on godbolt.

https://c.godbolt.org/z/sdd15c

Ah, point taken, but that's within the bounds of what many people expect; propagating the fact that the overflow is "impossible" to rearrange earlier control flow is more surprising and more recent.
> Ah, point taken, but that's within the bounds of what many people expect;

The thing is it's very hard to draw the line once you go that route. Different people expect different things from undefined behavior. The best thing is to not expect anything sane. And if you are unhappy about certain undefined behaviors in the standard then it's better to push the standard to define more behavior. Certain unnecessary undefined behaviors get resolved with newer standards, although I would expect significant pushback on defining the behavior signed integer overflow.

I understand there's a good chance the next standard will specify that signed integer overflow results in an unspecified value, which would match the behaviour of older compilers and what (IME) most programmers tend to expect.
Ye that is true. C compilers got some strange gotchas that you need to memorize but my main point is that those problems atleast to my projects are miniscule compared to off by one out of bound array access or dereferencing null pointers.
I agree with you here, but even these two categories of runtime errors are much more painful in C/C++ than in most other languages.

As I mentioned elsewhere in the thread, you can ask gcc to trap if your code is about to dereference NULL, but the compiler can't easily detect all instances of out-of-bounds array access, due to the way arrays and pointers work in C. I believe Valgrind can help detect out-of-bounds errors at runtime, but in most languages you don't need a sledgehammer like Valgrind to find these common errors.