This is a very naive statement. Sure there are a handful of good companies that enable every single compiler warning, and fix those warnings, and then run the code through Coverity, and fix all those problems too. Almost no one else does. The amount of terrible C in the real world is enormous.
>> Sure there are a handful of good companies that enable every single compiler warning.
You think so? Every company I've worked for, or that I've known people that worked there, always enabled -Wall for their C and C++ code. Most OSS software compiles with all warnings enabled.
I think the issue with undefined behavior in C/C++ is extremely overblown, aside from fun academic examples like 'what does i++++i++ evaluate to' there isn't actually all that much undefined behavior or gotchas in C/C++. I would say there are less, compared to other languages I know.
Signed overflow problems are everywhere, even in carefully written code. Using 'int' instead of a more specific type is a code smell. Security code which presumes that because you wrote ptr != NULL, that the check is actually carried out. Code that does type punning. Code that doesn't know about aliasing. It goes on and on.
You need to know that the problem exists in order to know that you have a problem. There are many C programmers who learned C back in the 1980s who don't even realize these are issues.
I'm still adding the compiler specific annotations to add format string checking to custom variadic logging functions in codebases I inherit, and finding multiple bugs.
...but please, for the love of Mike, don't ship source code with -Werror.
There's nothing like the experience of trying to fix somebody else's code which compiled fine on gcc version 8.97 but which now fails to compile on gcc version 8.98 because the new compiler has some new warnings, which it's now treating as errors, and now fails to compile.
...and you've got stuff to do, and the program isn't even broken.
> … and you've got stuff to do, and the program isn't even broken.
Well, it may be — that's one of the problems with C: you never really know for sure if a warning really matters or not. But man, there sure are a lot of them!
I used to work with a guy who would regularly get upset about the idea letting the compiler return warnings because he knew better and didn't want to be bothered with it.
Last I checked he has a couple hundred points on the hacker news internet forums.
Also just last week I found and reported some undefined behavior in a major c++ package that's used by almost every player in as many as several industries. I don't expect it will ever make any difference in production, but it still snuck in.
"The amount of terrible C in the real world is enormous."
I'm sure you could say that about pretty much any programming language: "The amount of terrible X in the real world is enormous". There are also plenty of clean, nice, safe C code around (and any other language), there's no need to over-generalize ("Almost no one else does").
> I'm sure you could say that about pretty much any programming language: "The amount of terrible X in the real world is enormous".
But the damage is far greater in C. In other languages you won't have arbitrary code execution or privilege escalation just because the programmer is not careful. Nor will there be, in other languages, so many nondeterministic bugs that show up once in a blue moon.
"In other languages you won't have arbitrary code execution or privilege escalation just because the programmer is not careful"
No, it's possible to make system insecure with pretty much any language if programmer is not careful. SQL injection, cross-site scripting, cross-site request forgery and the list goes on..
Yes, you are right, void * is an exception. However, any other pointer cannot be reliably casted:
From C1X, section 6.3.2.3:
"A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced type, the behavior is
undefined."
Though that is quite odd, since any pointer can be converted to void* , which only needs alignment to the char type. So converting from x* -> y* is undefined, but x* -> void* -> y* is defined.
I am not trying to say it'll work, I'm trying to show that most non-trivial C programs invoke undefined behaviour, according to the spec.
According to my reading an intermediate void pointer allows the pointer casting to stay well defined. However this seems unsafe, even without getting into dereferencing, because implementations are allowed to store omit bits if they assume aligned pointers.
I'd say my example demonstrates the spec's statement:
"A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined."
The resulting uint32_t pointer in my example is not correctly aligned for the reference type, so undefined behavior (e.g., a trap on RISC) occurs. What's an example of a statement in a "non-trivial" C program that is in common use but you think is undefined?
int64_t a = 42;
void* p = &a;
int32_t* i = p;
printf("%i", *i);
Implementation defined, as type punning to char is legal (allowing the implementation of memcpy):
int64_t a = 42;
void* p = &a;
char* ch = p;
printf("%c", *ch);
Exercise left to the reader: Implement a "fast" memcpy (e.g. one that will copy more than 1 byte at a time for large copies, as your standard library implementation likely does) without violating strict aliasing rules.
Since I don't have a copy of the C standard handy, I'll reference this which covers the relevant sections of C++03, C++11, C99, and C11: http://stackoverflow.com/a/7005988/953531 . Quoting the C99 version bellow (§6.5 ¶7):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types 73) or 88):
* a type compatible with the effective type of the object,
* a qualified version of a type compatible with the effective type of the object,
* a type that is the signed or unsigned type corresponding to the effective type of the object,
* a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
* an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
* a character type.
73) or 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Bullet 6 is what allows the second sample to have defined behavior. For the first sample, unless I'm seriously mistaken, int32_t isn't considered "a type compatible with" int64_t. Bullet 2 talks of "qualified" versions of types - I believe this is referencing const/volatile qualified types. Bullet 3 apparently allows you to type pun (unsigned int) to (signed int) or vicea versa? Which is an interesting bit of new trivia to me. Bullet 4 is much of the same, bullet 5 requires a nonexistant union, and bullet 6 requests a character type.
Okay, good point - so it's the deferencing step that evokes the clause your mentioned. Apparently, what I learned today, is the cast is fully legal even though it could produce an invalid pointer.
I still wonder if my snippet counts as undefined behavior, since it does dereference an "unknown" void pointer, which may have come from an incompatible object type.
> I still wonder if my snippet counts as undefined behavior, since it does dereference an "unknown" void pointer, which may have come from an incompatible object type.
It counts as potentially undefined behavior - depends on what you pass in. NULL? UB. Pointer-to-uint64_t? UB. Pointer-to-uint32_t? Perfectly defined behavior! ...well, assuming we use ip[0] = 123; instead of ptr[0] = 123;, which won't compile as I've just noticed.
That said, there are some ways to construct pointers which are in and of themselves undefined behavior for merely constructing the pointer:
int a[] = { 1, 2, 3 };
int* b = a+0; // Perfectly defined/legal/normal
int* c = a+3; // Perfectly defined/legal/normal, just don't deference it (as it points past the end of the array)
int* d = a+4; // Undefined behavior. HAIL SATAN!
int* e = a-1; // Undefined behavior. Also apparently potentially caused optimization induced breakage in practice. HAIL GCC!