Hacker News new | ask | show | jobs
by haberman 3688 days ago
> K&R book is the only book you need to read to know everything about C. All you need after you understand the fundamentals is a bit of discipline.

I am a huge C fan but this is not true at all. C has tons of pitfalls, especially with modern UB-aggressive optimizing compilers. There are a lot of rules you need to be aware of that are not naturally-occurring results of the fundamentals.

3 comments

> especially with modern UB-aggressive optimizing compilers.

You put your finger on the problem: "modern UB-aggressive optimising compilers". C, the language, is actually quite simple (if not easy). The crazy stuff that compiler writers have been doing recently while aggressively mis-reading the C standard is the problem and does make things very complicated.

Why "misreading"?

From 1.1:

"The X3J11 charter clearly mandates the Committee to codify common existing practice."

Their emphasis, not mine. So is there a mandate to use the definitions of the standard to invalidate common existing practice? Clearly not. Yet that is what is happening.

More from the standard (defining UB):

"Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behaviour."

Does it say "Undefined behaviour gives implementors license to add new optimisations that break existing programs"? Clearly and unambiguously not.

See http://port70.net/~nsz/c/c89/rationale/a.html#1

Your interpretation of "codify common existing practice" would imply that no new compiler optimizations could be implemented since 1990 (when the first version of the standard was published), as any optimization could potentially change the observable execution behavior of an erroneous program that contains UB.

> More from the standard (defining UB):

Your quote is not from the normative text of the standard, but from the non-normative rationale. Note however that it explicitly says that programs that contain undefined behaviors are erroneous, and that the implementation is not required to emit diagnostics for the UB. Pretty clearly this allows implementations to optimize erroneous programs into whatever they think is funny this week.

The normative text of the standard is pretty unambiguous:

    undefined behavior
    behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
    for which this International Standard imposes no requirements
http://www.iso-9899.info/n1570.html#3.4.3
> Your interpretation of "codify common existing practice" would imply that no new compiler optimizations could be implemented since 1990

Utter nonsense. I use that word carefully, but in this case it is absolutely appropriate.

Compiler optimisations per an old but very useful definition aren't allowed to change the visible behaviour of programs (in terms of output, obviously they are allowed to change execution times).

For example, even just a couple of years ago the compilers I used would execute a loop that sums the first n integers. Nowadays compilers detect this and replace the loop with the result. While this isn't particularly useful, because probably the only reason you're summing the first n integers in a loop is to do some measurements, it is (a) a perfectly legal optimisation and (b) happened after 1990.

Unsurprisingly, you left out the second part of the (later) definition:

   NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
    results, to behaving during translation or program execution in a documented manner characteristic of the
    environment (with or without the issuance of a diagnostic message), to terminating a translation or
    execution (with the issuance of a diagnostic message).
Notably absent is "use the undefined behaviour to shave another 0.2% off my favourite benchmark".
> Unsurprisingly, you left out the second part of the (later) definition:

It is not part of the normative definition, which says "for which this International Standard imposes no requirements". In ISO standards, notes are without exception non-normative.

Although I think they really should add your proposed text as an additional example, as their current set of examples is evidently confusingly incomplete :-)

>Note however that it explicitly says that programs that contain undefined behaviors are erroneous

No it doesn't say that. It says that they are either "nonportable" or "erroneous". I'll take "nonportable" for 400, please.

As the "rationale" document points out, implementations are free to do something well-defined in the cases that the standard considers UB. For example, an implementation may document that it detects out-of-bounds array reads and these always return the value "0", and a hypothetical "C" program could rely on that. But implementations explicitly aren't required to do that, hence code that relies on a particular interpretation of UB in a particular implementation is nonportable, since it is a program written in an extended dialect of C, not ISO standard C.

Options like GCC's -fwrapv/-ftrapv and -fno-strict-aliasing are examples of language extensions that are essentially implementation defined UB.

Edit: Of course you could argue that things where hardware difference are a likely motivation such as signed integer overflow ought not to be UB in the first place, but instead left as implementation defined in the standard, but in that case your issue is with the C standard committee, not with implementers.

Out of curiosity, do you have an example?

Maybe I live in a C reality distortion field. :)

There are so many to choose from. Here is one I just thought up:

    void free_circularly_linked_list(struct node *head) {
      struct node *tmp = head;
      do {
        struct node *next = tmp->next;
        free(tmp);
        tmp = next;
      } while (tmp != head);
    }
Can you spot the undefined behavior?
This is a great example because if it wasn't presented as "spot the UB", I'd expect very few people would raise a concern.

I've written up a demo with your code, running it through several analysers:

https://gist.github.com/technion/1b12c9b4581e915241d9483c5c2...

The tl;dr here is that tis-interpreter is a fantastic new tool, as it correctly complains about this.

Edit: I also note a departure from yester-year, where every linting tool would only manage to complain about unchecked malloc() returns.

The `tmp != head` comparion is UB because `head` is a dangling pointer after the first loop iteration, right?
Yep! To do this properly requires something more like:

    void free_circularly_linked_list(struct node *head) {
      struct node *tmp = head->next;
      while (1) {
        if (tmp == head) {
          /* Has to be a separate case since even assigning
           * a dangling pointer is UB I believe? */
          free(tmp);
          break;
        } else {
          struct node *next = tmp->next;
          free(tmp);
          tmp = next;
        }
      }
    }
I'm not sure what you are trying to achieve by using the infinite loop. There's a more direct way.

    void free_circularly_linked_list(struct node *head)
    {
      struct node *a = head->next;
      while (a != head) {
        struct node *b = a->next;
        free(a);
        a = b;
      }
      free(head);
    }
Great example, by the way!
Why is it UB?

Let's say head value is "10" and the memory at "10" is {..., next: "10"}

After the first iteration we will have:

Head: "10" Next: "10" Temp: "10"

With "10" pointing to freed memory. But why do we care? We are not dereferencing it, are we?

(I think I am missing something very obvious)

I think you’re missing the fact that "There are a lot of rules you need to be aware of that are not naturally-occurring results of the fundamentals."
I was indeed. Thanks for the insight!
Because the standard says even comparing a dangling pointer is UB, which was haberman's point about the standard being non-intuitive.
Thanks. I wasn't aware of that. I stand corrected!!
For someone who wants to learn C from the ground up, do you have some kind of learning path or books you'd recommend?
I was a big fan of this post from a few days ago. Has a great list of resources and different areas to cover: http://blog.regehr.org/archives/1393
Thanks, that's what I was looking for.