Hacker News new | ask | show | jobs
by rhexs 3688 days ago
Low level languages are tough. Gaining a mastery of C does require knowing quite a few strange rules and quirks and it's certainly a bit harder than learning Python. The C FAQ does a good job of illustrating some of the more confusing parts. Sure, I wish I could write Go instead, but that isn't going to happen on the many embedded systems I work on.

This is a rather strange and insulting article. I'm not sure why Zed can't help "old programmers" nor do I understand why he's angered that individuals know about undefined behavior in C. Is there any background to this or did he have the misfortune of being insulted on IRC?

Edit -- I googled for a bit and discovered this was in response to someone doing a pretty good job technically reviewing the book for free! http://hentenaar.com/dont-learn-c-the-wrong-way Perhaps the title was a bit inflammatory.

Zed's rebuttal is at https://zedshaw.com/2015/09/28/taking-down-tim-hentenaar/ and is a great example of how not to react to constructive criticism. My favorite part is his safercopy function and the lack of size_t.

And finally, to leave us all with a quote from Zed's rebuttal:

"Over this next week I’m going to systematically take down more of my detractors as I’ve collected a large amount of information on them, their actual skill levels, and how they treat beginners. Stay tuned for more."

Wow.

3 comments

Read the review over at hentenar.com, though I've never read Zed's work.

All I can say is the order of topics, the choice of topics and the quoted explanations would make for a very confused beginner. Especially the crusade he seems to have against strings and functions called incorrectly. That makes me think he should be teaching the language, not the language he wishes it were. Of course these are selective quotations so I can't draw too many conclusions.

Going on my time teaching C, I wouldn't even mention Duff's device or safer, better strings at this level. There's better ways to introduce defensive programming, along with a discussion of the pros and cons.

Oh, I'm past 50 so am clearly "doomed" and beyond help. Not that I'm sure what I need help with. Oh well. :)

> nor do I understand why he's angered that individuals know about undefined behavior in C

If you read the first part of that same sentence, it should give you a clue.

> Low level languages are tough

I disagree. Low level languages, especially C, are the easiest to master. K&R book is the only book you need to read to know everything about C. All you need after you understand the fundamentals is a bit of discipline.

C++ on the other hand is extremely difficult to master. Just have a look at the rules for Rvalue references and you will see what I mean.

It may be easier for a complete novice to write some code that doesn't crash in C++ than it is in C, but not mastering it, or even be good at it.

> K&R book is the only book you need to read to know everything about C. All you need after you understand the fundamentals is a bit of discipline.

I am a huge C fan but this is not true at all. C has tons of pitfalls, especially with modern UB-aggressive optimizing compilers. There are a lot of rules you need to be aware of that are not naturally-occurring results of the fundamentals.

> especially with modern UB-aggressive optimizing compilers.

You put your finger on the problem: "modern UB-aggressive optimising compilers". C, the language, is actually quite simple (if not easy). The crazy stuff that compiler writers have been doing recently while aggressively mis-reading the C standard is the problem and does make things very complicated.

Why "misreading"?

From 1.1:

"The X3J11 charter clearly mandates the Committee to codify common existing practice."

Their emphasis, not mine. So is there a mandate to use the definitions of the standard to invalidate common existing practice? Clearly not. Yet that is what is happening.

More from the standard (defining UB):

"Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behaviour."

Does it say "Undefined behaviour gives implementors license to add new optimisations that break existing programs"? Clearly and unambiguously not.

See http://port70.net/~nsz/c/c89/rationale/a.html#1

Your interpretation of "codify common existing practice" would imply that no new compiler optimizations could be implemented since 1990 (when the first version of the standard was published), as any optimization could potentially change the observable execution behavior of an erroneous program that contains UB.

> More from the standard (defining UB):

Your quote is not from the normative text of the standard, but from the non-normative rationale. Note however that it explicitly says that programs that contain undefined behaviors are erroneous, and that the implementation is not required to emit diagnostics for the UB. Pretty clearly this allows implementations to optimize erroneous programs into whatever they think is funny this week.

The normative text of the standard is pretty unambiguous:

    undefined behavior
    behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
    for which this International Standard imposes no requirements
http://www.iso-9899.info/n1570.html#3.4.3
> Your interpretation of "codify common existing practice" would imply that no new compiler optimizations could be implemented since 1990

Utter nonsense. I use that word carefully, but in this case it is absolutely appropriate.

Compiler optimisations per an old but very useful definition aren't allowed to change the visible behaviour of programs (in terms of output, obviously they are allowed to change execution times).

For example, even just a couple of years ago the compilers I used would execute a loop that sums the first n integers. Nowadays compilers detect this and replace the loop with the result. While this isn't particularly useful, because probably the only reason you're summing the first n integers in a loop is to do some measurements, it is (a) a perfectly legal optimisation and (b) happened after 1990.

Unsurprisingly, you left out the second part of the (later) definition:

   NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
    results, to behaving during translation or program execution in a documented manner characteristic of the
    environment (with or without the issuance of a diagnostic message), to terminating a translation or
    execution (with the issuance of a diagnostic message).
Notably absent is "use the undefined behaviour to shave another 0.2% off my favourite benchmark".
> Unsurprisingly, you left out the second part of the (later) definition:

It is not part of the normative definition, which says "for which this International Standard imposes no requirements". In ISO standards, notes are without exception non-normative.

Although I think they really should add your proposed text as an additional example, as their current set of examples is evidently confusingly incomplete :-)

>Note however that it explicitly says that programs that contain undefined behaviors are erroneous

No it doesn't say that. It says that they are either "nonportable" or "erroneous". I'll take "nonportable" for 400, please.

As the "rationale" document points out, implementations are free to do something well-defined in the cases that the standard considers UB. For example, an implementation may document that it detects out-of-bounds array reads and these always return the value "0", and a hypothetical "C" program could rely on that. But implementations explicitly aren't required to do that, hence code that relies on a particular interpretation of UB in a particular implementation is nonportable, since it is a program written in an extended dialect of C, not ISO standard C.

Options like GCC's -fwrapv/-ftrapv and -fno-strict-aliasing are examples of language extensions that are essentially implementation defined UB.

Edit: Of course you could argue that things where hardware difference are a likely motivation such as signed integer overflow ought not to be UB in the first place, but instead left as implementation defined in the standard, but in that case your issue is with the C standard committee, not with implementers.

Out of curiosity, do you have an example?

Maybe I live in a C reality distortion field. :)

There are so many to choose from. Here is one I just thought up:

    void free_circularly_linked_list(struct node *head) {
      struct node *tmp = head;
      do {
        struct node *next = tmp->next;
        free(tmp);
        tmp = next;
      } while (tmp != head);
    }
Can you spot the undefined behavior?
This is a great example because if it wasn't presented as "spot the UB", I'd expect very few people would raise a concern.

I've written up a demo with your code, running it through several analysers:

https://gist.github.com/technion/1b12c9b4581e915241d9483c5c2...

The tl;dr here is that tis-interpreter is a fantastic new tool, as it correctly complains about this.

Edit: I also note a departure from yester-year, where every linting tool would only manage to complain about unchecked malloc() returns.

The `tmp != head` comparion is UB because `head` is a dangling pointer after the first loop iteration, right?
Yep! To do this properly requires something more like:

    void free_circularly_linked_list(struct node *head) {
      struct node *tmp = head->next;
      while (1) {
        if (tmp == head) {
          /* Has to be a separate case since even assigning
           * a dangling pointer is UB I believe? */
          free(tmp);
          break;
        } else {
          struct node *next = tmp->next;
          free(tmp);
          tmp = next;
        }
      }
    }
Why is it UB?

Let's say head value is "10" and the memory at "10" is {..., next: "10"}

After the first iteration we will have:

Head: "10" Next: "10" Temp: "10"

With "10" pointing to freed memory. But why do we care? We are not dereferencing it, are we?

(I think I am missing something very obvious)

For someone who wants to learn C from the ground up, do you have some kind of learning path or books you'd recommend?
I was a big fan of this post from a few days ago. Has a great list of resources and different areas to cover: http://blog.regehr.org/archives/1393
Thanks, that's what I was looking for.
These days I'd say the ISO standard is the only "book" you need to know everything about C.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf is the most recent draft before the official, purchase-only C11 was published according to http://www.open-std.org/jtc1/sc22/wg14/www/standards. I don't know if it's identical, but it should be close, and it's free.

K&R book is the only book you need to read to know everything about C.

Absolutely not.

I wouldn't say that C is easy to master, but it's not very difficult either.

The problem with C is that even a C master can't necessarily write correct code, because C is a very programmer-unfriendly language, making developers remember to do various actions manually and perform error-prone calculations.

C++ is definitely harder to master (after many years, I can't say I master every corner of the language), but it's much easier to write correct code in C++ and it will be just as fast, run on as many platforms, etc, etc.

C lost this battle a long time ago, it's surviving because of nostalgia, still having good street cred and inertia. The number of domains where one must use C is shrinking and now that we also have Go and Rust this will accelerate. All for the better, really.

> C lost this battle a long time ago... The number of domains where one must use C is shrinking

I doubt that. Kernels, drivers, embedded devices (not IoT), GNU world, are all highly C oriented. Want to develop for a customer with unknown unix variant? Want to develop a tool everyone are going to use, either on Linux/BSD/Solaris? C is the only option.

> but it's much easier to write correct code in C++ and it will be just as fast

Writing correct and fast C++ code at the same time was never an option; even today, with "safe" pointers, people are still confused how to correctly use shared_ptr<>.

> now that we also have Go and Rust this will accelerate

Some places where C is still a strong contender:

* good tooling - debuggers, memory leak detectors, years of experience with compilers on various platforms

* well understood language - C has dark corners and they are documented well

* interfacing with everything else - from devices to libraries and languages

Rust can piggyback on almost any C tooling (emits DWARF debug info), and has very strong C interoperability, if C can talk to it, rust probably can too.
This can be summed up as: Languages like Python are easy to learn but hard to master whereas C is hard to learn but easy to master.
This is not true because K&R does not address multithreaded programming at all.