Hacker News new | ask | show | jobs
by kinkdr 3688 days ago
> Low level languages are tough

I disagree. Low level languages, especially C, are the easiest to master. K&R book is the only book you need to read to know everything about C. All you need after you understand the fundamentals is a bit of discipline.

C++ on the other hand is extremely difficult to master. Just have a look at the rules for Rvalue references and you will see what I mean.

It may be easier for a complete novice to write some code that doesn't crash in C++ than it is in C, but not mastering it, or even be good at it.

6 comments

> K&R book is the only book you need to read to know everything about C. All you need after you understand the fundamentals is a bit of discipline.

I am a huge C fan but this is not true at all. C has tons of pitfalls, especially with modern UB-aggressive optimizing compilers. There are a lot of rules you need to be aware of that are not naturally-occurring results of the fundamentals.

> especially with modern UB-aggressive optimizing compilers.

You put your finger on the problem: "modern UB-aggressive optimising compilers". C, the language, is actually quite simple (if not easy). The crazy stuff that compiler writers have been doing recently while aggressively mis-reading the C standard is the problem and does make things very complicated.

Why "misreading"?

From 1.1:

"The X3J11 charter clearly mandates the Committee to codify common existing practice."

Their emphasis, not mine. So is there a mandate to use the definitions of the standard to invalidate common existing practice? Clearly not. Yet that is what is happening.

More from the standard (defining UB):

"Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behaviour."

Does it say "Undefined behaviour gives implementors license to add new optimisations that break existing programs"? Clearly and unambiguously not.

See http://port70.net/~nsz/c/c89/rationale/a.html#1

Your interpretation of "codify common existing practice" would imply that no new compiler optimizations could be implemented since 1990 (when the first version of the standard was published), as any optimization could potentially change the observable execution behavior of an erroneous program that contains UB.

> More from the standard (defining UB):

Your quote is not from the normative text of the standard, but from the non-normative rationale. Note however that it explicitly says that programs that contain undefined behaviors are erroneous, and that the implementation is not required to emit diagnostics for the UB. Pretty clearly this allows implementations to optimize erroneous programs into whatever they think is funny this week.

The normative text of the standard is pretty unambiguous:

    undefined behavior
    behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
    for which this International Standard imposes no requirements
http://www.iso-9899.info/n1570.html#3.4.3
> Your interpretation of "codify common existing practice" would imply that no new compiler optimizations could be implemented since 1990

Utter nonsense. I use that word carefully, but in this case it is absolutely appropriate.

Compiler optimisations per an old but very useful definition aren't allowed to change the visible behaviour of programs (in terms of output, obviously they are allowed to change execution times).

For example, even just a couple of years ago the compilers I used would execute a loop that sums the first n integers. Nowadays compilers detect this and replace the loop with the result. While this isn't particularly useful, because probably the only reason you're summing the first n integers in a loop is to do some measurements, it is (a) a perfectly legal optimisation and (b) happened after 1990.

Unsurprisingly, you left out the second part of the (later) definition:

   NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
    results, to behaving during translation or program execution in a documented manner characteristic of the
    environment (with or without the issuance of a diagnostic message), to terminating a translation or
    execution (with the issuance of a diagnostic message).
Notably absent is "use the undefined behaviour to shave another 0.2% off my favourite benchmark".
> Unsurprisingly, you left out the second part of the (later) definition:

It is not part of the normative definition, which says "for which this International Standard imposes no requirements". In ISO standards, notes are without exception non-normative.

Although I think they really should add your proposed text as an additional example, as their current set of examples is evidently confusingly incomplete :-)

>Note however that it explicitly says that programs that contain undefined behaviors are erroneous

No it doesn't say that. It says that they are either "nonportable" or "erroneous". I'll take "nonportable" for 400, please.

As the "rationale" document points out, implementations are free to do something well-defined in the cases that the standard considers UB. For example, an implementation may document that it detects out-of-bounds array reads and these always return the value "0", and a hypothetical "C" program could rely on that. But implementations explicitly aren't required to do that, hence code that relies on a particular interpretation of UB in a particular implementation is nonportable, since it is a program written in an extended dialect of C, not ISO standard C.

Options like GCC's -fwrapv/-ftrapv and -fno-strict-aliasing are examples of language extensions that are essentially implementation defined UB.

Edit: Of course you could argue that things where hardware difference are a likely motivation such as signed integer overflow ought not to be UB in the first place, but instead left as implementation defined in the standard, but in that case your issue is with the C standard committee, not with implementers.

Out of curiosity, do you have an example?

Maybe I live in a C reality distortion field. :)

There are so many to choose from. Here is one I just thought up:

    void free_circularly_linked_list(struct node *head) {
      struct node *tmp = head;
      do {
        struct node *next = tmp->next;
        free(tmp);
        tmp = next;
      } while (tmp != head);
    }
Can you spot the undefined behavior?
This is a great example because if it wasn't presented as "spot the UB", I'd expect very few people would raise a concern.

I've written up a demo with your code, running it through several analysers:

https://gist.github.com/technion/1b12c9b4581e915241d9483c5c2...

The tl;dr here is that tis-interpreter is a fantastic new tool, as it correctly complains about this.

Edit: I also note a departure from yester-year, where every linting tool would only manage to complain about unchecked malloc() returns.

The `tmp != head` comparion is UB because `head` is a dangling pointer after the first loop iteration, right?
Yep! To do this properly requires something more like:

    void free_circularly_linked_list(struct node *head) {
      struct node *tmp = head->next;
      while (1) {
        if (tmp == head) {
          /* Has to be a separate case since even assigning
           * a dangling pointer is UB I believe? */
          free(tmp);
          break;
        } else {
          struct node *next = tmp->next;
          free(tmp);
          tmp = next;
        }
      }
    }
I'm not sure what you are trying to achieve by using the infinite loop. There's a more direct way.

    void free_circularly_linked_list(struct node *head)
    {
      struct node *a = head->next;
      while (a != head) {
        struct node *b = a->next;
        free(a);
        a = b;
      }
      free(head);
    }
Great example, by the way!
Why is it UB?

Let's say head value is "10" and the memory at "10" is {..., next: "10"}

After the first iteration we will have:

Head: "10" Next: "10" Temp: "10"

With "10" pointing to freed memory. But why do we care? We are not dereferencing it, are we?

(I think I am missing something very obvious)

I think you’re missing the fact that "There are a lot of rules you need to be aware of that are not naturally-occurring results of the fundamentals."
Because the standard says even comparing a dangling pointer is UB, which was haberman's point about the standard being non-intuitive.
For someone who wants to learn C from the ground up, do you have some kind of learning path or books you'd recommend?
I was a big fan of this post from a few days ago. Has a great list of resources and different areas to cover: http://blog.regehr.org/archives/1393
Thanks, that's what I was looking for.
These days I'd say the ISO standard is the only "book" you need to know everything about C.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf is the most recent draft before the official, purchase-only C11 was published according to http://www.open-std.org/jtc1/sc22/wg14/www/standards. I don't know if it's identical, but it should be close, and it's free.

K&R book is the only book you need to read to know everything about C.

Absolutely not.

I wouldn't say that C is easy to master, but it's not very difficult either.

The problem with C is that even a C master can't necessarily write correct code, because C is a very programmer-unfriendly language, making developers remember to do various actions manually and perform error-prone calculations.

C++ is definitely harder to master (after many years, I can't say I master every corner of the language), but it's much easier to write correct code in C++ and it will be just as fast, run on as many platforms, etc, etc.

C lost this battle a long time ago, it's surviving because of nostalgia, still having good street cred and inertia. The number of domains where one must use C is shrinking and now that we also have Go and Rust this will accelerate. All for the better, really.

> C lost this battle a long time ago... The number of domains where one must use C is shrinking

I doubt that. Kernels, drivers, embedded devices (not IoT), GNU world, are all highly C oriented. Want to develop for a customer with unknown unix variant? Want to develop a tool everyone are going to use, either on Linux/BSD/Solaris? C is the only option.

> but it's much easier to write correct code in C++ and it will be just as fast

Writing correct and fast C++ code at the same time was never an option; even today, with "safe" pointers, people are still confused how to correctly use shared_ptr<>.

> now that we also have Go and Rust this will accelerate

Some places where C is still a strong contender:

* good tooling - debuggers, memory leak detectors, years of experience with compilers on various platforms

* well understood language - C has dark corners and they are documented well

* interfacing with everything else - from devices to libraries and languages

Rust can piggyback on almost any C tooling (emits DWARF debug info), and has very strong C interoperability, if C can talk to it, rust probably can too.
This can be summed up as: Languages like Python are easy to learn but hard to master whereas C is hard to learn but easy to master.
This is not true because K&R does not address multithreaded programming at all.