Hacker News new | ask | show | jobs
by jbb555 3544 days ago
Agree entirely with all this.

You know what you get with C and it just works as you expect always. I always feel refreshed after writing C.

There are perhaps a few very minor enhancements I'd suggest, but I'd be very reluctant to open the floodgates and ruin it.

3 comments

It always works as you expect? Really?

C is one of the few languages, along with C++, that revels in undefined behaviour. It can be very hard to reliably know what a C program will do if it's not written very carefully because there are so many constructs that look benign but which are technically wrong, and the compiler will mercilessly exploit in order to optimise your program into nonsense.

This, While C and C++ are both very powerful languages and could still be considered the "industry standard"(loosely used) It is definitely not a language that will make your applications work as expected, I would say that C#/Java or higher level scripting languages like Lua or Javascript are languages that will let you make "code it and it works" applications
C and C++ are the only languages I regularly use wherein I need to resort to disassembly to debug the damn thing.
I find this statement hard to imagine. I've been coding professionally in C++ for just shy of 20 years now (and in C for a few years before that) and have only resorted to assembly when I needed to actually use assembly for performance reasons.

Actually - not completely true. I do like to look at the assembly from time to time for other reasons, but this is rare.

Even when I have performance to worry about, intrinsics are usually an option - possibly even a superior option - when writing performance sensitive code. But here are some of the cases where I resort to looking at program disassembly - I'll note a theme here of debugging optimized code (because who doesn't love heisenbugs):

1) Diagnosing a crash which turned out to be the result of a 1 byte vtable pointer corruption bug in an older codebase, which turned out to be a bad static_cast in relatively removed code (a good case for boost::polymorphic_downcast!). Simply understanding which pointer was bad in the first place required looking at the disassembly - when you can't rely on your debugger's results thanks to optimization.

2) Figuring out the actual values of variables in crashes and crash dumps of optimized builds to properly root cause a bug, when the debugger gets confused - or simply aggressively inlined and reordered everything so aggressively that there's no sensible values to even display (so, most crash dumps.)

3) Noticing when the optimizer has reordered code "unexpectedly", alerting me to the fact that supposedly thread safe code is in fact nowhere nearly remotely safe and is in fact missing many memory barriers (possibly because their portable macros "helpfully" defaulted to a noop on whatever new and previously unrecognized platform I'm porting to.)

4) Noticing when the optimizer has removed or rewritten code in an "incorrect" manner, helping me debug code that would've worked if it hadn't technically invoked undefined behavior, so I can a) fix it, b) attempt to explain to my coworker that, yes, it's really undefined behavior, and yes, it's actually a problem (typically with a combination of citing the standard and linking INVALID WONTFIX-ed "bugs" in some compiler's bug database), c) be reasonably certain I've actually found the real root cause of a bug and fixed it.

Now, yes, I'll admit this isn't 100% of my debugging sessions. And perhaps I'm an outlier. My coworkers generally learn that I can (eventually) tackle pretty much any weird bug they might be struggling with and that I'm happy to help. All the porting I've done reopens a whole codebase's worth of wounds - latent undefined behavior that another compiler's optimizer didn't take advantage of.

But on the other hand, I've been lucky enough to never encounter a codegen bug in all the compiler and linker bugs I've found. So far. That I know of. And while "rare" by incidence, these are the debugging sessions that can eat weeks at a time for a single bug, when sufficiently nasty and novel.

"It always works as you expect? Really?"

Well, it works as defined in standards. It's just not every programmer knows what to expect. C is simple, yet powerful language and with powers comes responsibility. It's not like you can throw some libraries/modules/objects (or whatever it is in other, safe languages) together, upload to server and call it a day -- static testing, debugging, unit testing is a vital part of any semi serious C project.

I hear this, but I've rarely experienced it. Maybe I unconsciously avoid such cases with long use. I started with assembler, and debug C/C++ with disassembly turned on, so maybe that's why.
It's just really not that hard to avoid undefined behavior. Really. The worst is arguably integer overflow, and that's not even all that challenging.
C++ is far more prone to undefined behaviour and even compilation across systems with the STL.
Undefined behaviour is not as prevalent in real world code as reading articles from HN might make you think.
This is a very naive statement. Sure there are a handful of good companies that enable every single compiler warning, and fix those warnings, and then run the code through Coverity, and fix all those problems too. Almost no one else does. The amount of terrible C in the real world is enormous.
>> Sure there are a handful of good companies that enable every single compiler warning.

You think so? Every company I've worked for, or that I've known people that worked there, always enabled -Wall for their C and C++ code. Most OSS software compiles with all warnings enabled.

I think the issue with undefined behavior in C/C++ is extremely overblown, aside from fun academic examples like 'what does i++++i++ evaluate to' there isn't actually all that much undefined behavior or gotchas in C/C++. I would say there are less, compared to other languages I know.

Signed overflow problems are everywhere, even in carefully written code. Using 'int' instead of a more specific type is a code smell. Security code which presumes that because you wrote ptr != NULL, that the check is actually carried out. Code that does type punning. Code that doesn't know about aliasing. It goes on and on.

You need to know that the problem exists in order to know that you have a problem. There are many C programmers who learned C back in the 1980s who don't even realize these are issues.

I'd say things have changed quite a bit since format string bugs...
> always enabled -Wall for their C and C++ code

Of course you want -Wall -Wextra -Werror -pedantic. ;)

...but please, for the love of Mike, don't ship source code with -Werror.

There's nothing like the experience of trying to fix somebody else's code which compiled fine on gcc version 8.97 but which now fails to compile on gcc version 8.98 because the new compiler has some new warnings, which it's now treating as errors, and now fails to compile.

...and you've got stuff to do, and the program isn't even broken.

Or if you don't have to use gcc, just -Weverything in clang.
I used to work with a guy who would regularly get upset about the idea letting the compiler return warnings because he knew better and didn't want to be bothered with it.

Last I checked he has a couple hundred points on the hacker news internet forums.

Also just last week I found and reported some undefined behavior in a major c++ package that's used by almost every player in as many as several industries. I don't expect it will ever make any difference in production, but it still snuck in.

"The amount of terrible C in the real world is enormous."

I'm sure you could say that about pretty much any programming language: "The amount of terrible X in the real world is enormous". There are also plenty of clean, nice, safe C code around (and any other language), there's no need to over-generalize ("Almost no one else does").

> I'm sure you could say that about pretty much any programming language: "The amount of terrible X in the real world is enormous".

But the damage is far greater in C. In other languages you won't have arbitrary code execution or privilege escalation just because the programmer is not careful. Nor will there be, in other languages, so many nondeterministic bugs that show up once in a blue moon.

> In other languages you won't have arbitrary code execution or privilege escalation just because the programmer is not careful.

Sure you do. Remember the YAML fiasco with Ruby? How about the thousand-and-one RCE issues with PHP? eval isn't evil for no reason.

"In other languages you won't have arbitrary code execution or privilege escalation just because the programmer is not careful"

No, it's possible to make system insecure with pretty much any language if programmer is not careful. SQL injection, cross-site scripting, cross-site request forgery and the list goes on..

Yeah, I do web development. I've worked with javascript, PHP, and, sigh, classic ASP.

There's bad code everywhere. Some languages make it a bit easier, but it's really not the languages fault.

There are very few programming languages where the total lines of code written is larger than the amount of bad C code written.
There are very programming languages where the total lines of code written is even comparable to C, so of course there is more of bad code too.
You're right - it's far more prevalent, if the bug database for basically every C or C++ codebase I've ever seen is any indication.
it's a heatmap.
There are some very interesting surveys and studies that suggest otherwise. Undefined behaviour due to integer overflow seems to be very common.

http://www.cs.utah.edu/~peterlee/papers/tosem15.pdf

I would tend to disagree. For example, casting void * to other pointer types is undefined, but this construct is often used, for example:

  void func(void *ptr)
    {
    uint32_t *ip = ptr;
    ptr[0] = 123;
    }
This is wrong. Casting to and from void * is defined for all pointer types except function pointers, according to the standard.

Otherwise most every assignment after call to malloc would be undefined.

Yes, you are right, void * is an exception. However, any other pointer cannot be reliably casted:

From C1X, section 6.3.2.3:

"A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined."

Though that is quite odd, since any pointer can be converted to void* , which only needs alignment to the char type. So converting from x* -> y* is undefined, but x* -> void* -> y* is defined.

That might not necessarily work. If you have something like:

>> uint8_t x[100];

>> uint32_t *y = &x[1];

And then dereference y, most RISC architectures will trap on the unaligned access. It doesn't matter if there is an intermediate void pointer or not.

To try and give some actual examples:

Undefined:

  int64_t a = 42;
  void* p = &a;
  int32_t* i = p;
  printf("%i", *i);
Implementation defined, as type punning to char is legal (allowing the implementation of memcpy):

  int64_t a = 42;
  void* p = &a;
  char* ch = p;
  printf("%c", *ch);
Exercise left to the reader: Implement a "fast" memcpy (e.g. one that will copy more than 1 byte at a time for large copies, as your standard library implementation likely does) without violating strict aliasing rules.
If you try and hit your finger with a hammer, your subsequent behavior is undefined. Please do not do that.
Where in the standard does it say your first example in undefined?
Since I don't have a copy of the C standard handy, I'll reference this which covers the relevant sections of C++03, C++11, C99, and C11: http://stackoverflow.com/a/7005988/953531 . Quoting the C99 version bellow (§6.5 ¶7):

  An object shall have its stored value accessed only by an lvalue expression that has one of the following types 73) or 88):

  * a type compatible with the effective type of the object,
  * a qualified version of a type compatible with the effective type of the object,
  * a type that is the signed or unsigned type corresponding to the effective type of the object,
  * a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  * an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  * a character type.

  73) or 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Bullet 6 is what allows the second sample to have defined behavior. For the first sample, unless I'm seriously mistaken, int32_t isn't considered "a type compatible with" int64_t. Bullet 2 talks of "qualified" versions of types - I believe this is referencing const/volatile qualified types. Bullet 3 apparently allows you to type pun (unsigned int) to (signed int) or vicea versa? Which is an interesting bit of new trivia to me. Bullet 4 is much of the same, bullet 5 requires a nonexistant union, and bullet 6 requests a character type.
Casting pointers is well defined. What is undefined behavior is to dereference a pointer whose type does not match the pointed-to object.
That's not correct in general; casting pointers is possibly undefined. However, it does seem a made a mistake trying to use void * as an example.
One of the best things about C is it is not upgraded frequently like other languages. So there are no frequent <language> <x.y> released posts here like we see for other languages.
On the downside, when it does get updated, compilers take ages to implement the new features, and in the meantime make up busywork like "let's break OS kernels or crypto code to get faster in some random benchmark nobody cares about!"
Don't need C updates for that, a GCC upgrade is quite sufficient! :(
gcc is generally used as a testing ground for new C/C++ features. So in most cases, the compiler supports new features before they are 'officially released' into the language.

It's the complete opposite of waiting for a feature to appear in the compiler.

GCC didn't get sort-of-complete C11 features until 4.9 (2014) and still omits (largely useless) optional features.
> You know what you get with C and it just works as you expect always.

Every language works just as you expect if you have the right expectations.