Hacker News new | ask | show | jobs
Essential C (2003) [pdf] (cslibrary.stanford.edu)
183 points by th33ngineer 2119 days ago
17 comments

This, paired with their Pointers and Memory [1] guide are how I learned C in college. They're both pretty short and to the point, I would highly recommend.

[1] http://cslibrary.stanford.edu/102/PointersAndMemory.pdf

That and other useful links are listed in the parent page of this pdf.

http://cslibrary.stanford.edu/101/

> the greatest pointer/recursion problem ever (advanceed)

Does 'advanceed' mean 'more advanced than advanced?' [1]

[1] https://www.youtube.com/watch?v=YAYKnnWCzto

Now then, isn't that nicer? (editorial) Build programs that do something cool rather than programs which flex the language's syntax. Syntax -- who cares?

Although some feel like it’s too opinionated to belong in this article I really appreciated the above. Edit To be clear the author is advocating for simpler syntax here to increase program readability. This could be taken other ways.

> Relying on the difference between the pre and post variations of these operators is a classic area of C programmer ego showmanship.

Ugh. He's talking about inline use of post-increment and pre-increment (i.e. x++ and ++x) here. This is perfectly readable to a C programmer, and sidestepping them actually makes the code harder to understand.

Can you give an example where not using them 'inline' makes code harder to understand?
Consider the idiomatic way of interating backward through a array:

  for(i=n; i-- > 0 ;)
    { /* operate on a[i] */ }
converting i--; to a statement at the start of block makes it less clear that it's part of the iteration idiom rather than a ad hoc adjustment that's specific to this particular logic. There are other examples, but they're either more involved or statementification is less obviously wrong.
Hmm, I think `for (i = n - 1; i >= 0; --i)` is way clearer and maybe more common?

edit: Ah unsigned underflow. :O

Yeah, so then you write

    for (size_t i = n-1; i < n; --i) { /* operate on a[i] */ }
It works fine (unsigned overflow is well defined) but it's even less clear.
It seems sensible to always just use signed values for indices. Indices are difference types, which should include negative values so that you can subtract two indices and get a sane delta. The range of signed values seems 'big enough.'
size_t is unsigned? Since when?
That's the idiomatic way? Cool. The more straightforward-looking way,

    for(i = n-1; i >= 0; i--)
        { /* operate on a[i] */ }
breaks if i is unsigned, like a size_t.
Yep. That why it's a idiom, rather than a obvious-way-of-doing-it-that-anyone-competent-would-use.
You can't beat

    return x++;
:)
Why? This should be straightforward.
Imagine operating on something like a stack.

  x = *stack--; // pop 'x' off of the stack
  *++stack = y; // push 'y' onto the stack
This way is simple, direct, and it avoids inconsistent state.
I don't see how this proves the point.

For someone who doesn't have the operator precedence rules memorized, it isn't clear whether the above code means this:

    x = *stack;
    stack--;
or this:

    stack--;
    x = *stack;
Combining those two operations into one line is a trade-off I will never agree with. And I'm a fan of C myself: https://gist.github.com/cellularmitosis/3327379b151445c602ad... https://gist.github.com/cellularmitosis/d8d4034c82b0ef817913...

The two-liner is actually the one which is simpler and more direct, as it requires less knowledge of operator precedence rules. The one-liner and two-liner compile to the same number of instructions, so I don't see how either "avoids inconsistent state".

Many expert-level C programmers tend towards one-liners. Here's an example from the original "Red book":

    c = ((((i&0x8)==0)^((j&0x8))==))*255;
nooooo don't do it sadpanda.jpg
> The one-liner and two-liner compile to the same number of instructions, so I don't see how either "avoids inconsistent state".

It's about performance, or thread safety, or anything like that; it's about having a coherent mental model of the code. A statement should, if possible, represent a single, complete operation. Invariants should not be violated by a statement, with respect to its environment. (This more true for 'push' than 'pop'.) One way of solving that is to bundle the 'push' and 'pop' operations up into functions; someone else in this thread did that. But why bother with the mental overhead of a function call when you could just represent the operation directly? To be sure, there are cases where the abstraction is warranted, but a two~three-line stack operation isn't abstraction, it's just indirection.

> For someone who doesn't have the operator precedence rules memorized, it isn't clear whether the above code means [snipped] or [snipped]

> The two-liner [...] requires less knowledge of operator precedence rules

It's not operator precedence—that's a separate issue; despite having implemented c operator precedence, I don't know all of them by heart—but simply behaviour of pre- and post-increment/decrement operations. It's even mnemonic—when the increment symbol goes before the thing being incremented, the increment happens first; else after—but even if not, it's a fairly basic language feature.

Even beyond that, though, it's an idiom. Code is not written in a vacuum. Patterns of pre- and post-increment fall into common use over time and become part of an established lexicon which is not specified anywhere. Natural language works the same way. Nothing wrong with that.

> It's not operator precedence—that's a separate issue

> It's even mnemonic—when the increment symbol goes before the thing being incremented, the increment happens first; else after—but even if not, it's a fairly basic language feature.

I think you missed the issue.

This is 100% about operator precedence, and has nothing to do with the decrement operator being in front of or behind the variable.

This expression:

    *stack--
means either this:

    (*stack)--
or this:

    *(stack--)
depending on the operator precedence rules.

If this is the layout of memory:

             ~~~~~~
    stack-1: | 52 |
    stack:   | 23 |
    stack+1: | 19 |
             ~~~~~~
(* stack)-- evaluates to 22, while *(stack--) evaluates to 52.

https://godbolt.org/z/P7Ghfc

Saving characters on spacing is a terrible thing to do. In fact that jumble is missing a zero on the equality, which is made less evident because all the the characters are not spaced in a way that makes this mistake obvious.

    int pop_int () 
    {
      int x = *stack; 
      --stack;
      return x;
    }

    void push_int(int x)
    {
      ++stack;
      *stack = x;
    }

Genunine questions:

- Is this worse? - How does the state get inconsistent?

For one it’s three and two lines for what is two logical operations. I assume the “inconsistent state” is the time between the lines where the stack is not truly in the right state-many people prefer to preserve their invariants as much as possible.
it will produce indistinguishable assembly language, no?
I highly recommend the CS50 course to get familiar with C:

https://www.youtube.com/playlist?list=PLhQjrBD2T381L3iZyDTxR...

Sure it doesn't get in details about the language but you get the essential and the videos are great.

I'm doing CS50x at the moment and I can definitely recommend it. It got me interested in C despite trying to avoid it my entire life. David Malan is one of the best lecturers I've seen.
It's how I started my career 7 years ago. Amazing course and lecturer
There is also Jens Gustedt's Modern C:

https://modernc.gforge.inria.fr/#org81433c2

I prefer something that is more up to date https://nostarch.com/Effective_C
While we are here, does anyone have a good resource about memory management strategies in C?

Topics:

* Best practices for C function signatures (caller allocates (which size?), callee allocates (where? which allocator?))

* Memory Ownership Models

* Borrowing

* Reference Counting

* Garbage Collectors and C-Libraries providing this functionality

* Interning Objects (Strings)

* RAII [1]? And it's benefits/flaws

[1] https://en.wikipedia.org/wiki/Resource_acquisition_is_initia...

Context: I feel that understanding the C memory primitives in not that hard (stack variables, malloc/free, C++'s new). But how to use them is devilishly tricky. I have seen little information about this.

Shameless plug:

This generally discusses the lack of RAII in C (towards the end), and what to do about it:

https://floooh.github.io/2019/09/27/modern-c-for-cpp-peeps.h...

...and this presents a (reasonably runtime-safe) general memory management strategy using tagged-index-handles instead of pointers:

https://floooh.github.io/2018/06/17/handles-vs-pointers.html

The gist is basically:

- don't allocate small chunks of memory all over the code base, instead move memory management into few centralized systems, and let those systems own all memory they allocate

- don't use pointers as public "object references", instead use "tagged index handles"

- don't use "owning pointers" at all, use pointers only as short-lived "immutable borrow references"

Another global (imperative style) vs object (code and data by object) vs functional ...
Thanks for this!
"Teach Yourself C in 24 Hours" by Tony Zhang (1997) seems interesting as well:

http://aelinik.free.fr/c/

A previous HN discussion on that book:

https://news.ycombinator.com/item?id=15624521

EDIT: That earlier discussion has an excellent first post. Quoting:

"It bothers me so much that very few books (Kernighan) talk about WHY. WHY. WHY is a variable needed? WHY is a function needed? WHY do we use OOP? Every single book out there jumps straight into explaining objects, how to create them, constructors, blah blah blah. No one fricking talks about what's the point of all this?

Teaching syntax is like muscle memory for learning Guitar. It is trivial and simply takes time. Syntax - everyone can learn and it is only one part of learning how to code. Concepts are explained on their own without building upon it.

[... A list with learning resources the poster finds great ...]

This is learning how to produce music. Not learning the F chord. Teaching how to code is fundamentally broken and very few books/courses do it well."

> Now then, isn't that nicer? (editorial) Build programs that do something cool rather than programs which flex the language's syntax. Syntax -- who cares?

I never really got to a point of learning Haskell or Lisp up until recently, it was always this --- I can do everything with C/C++/Java/Python and I could. But the thing is it is only after learning lisp that I really got the hang of thinking in top down manner(recursively), or for that matter it took Haskell to teach me composition intuitively, which then could be extended to my main language(C++). I understand that syntax doesn't matter much, but fwiw I still think in terms of lisp syntax when writing recursive code in C++/C. So yeah, take that for you will.

This guide is short (which is always nice) but not has a couple of flaws in places in the brief skim I gave it. For example:

> In particular, if you are designing a function that will be implemented on several different machines, it is a good idea to use typedefs to set up types like Int32 for 32 bit int and Int16 for 16 bit int.

Use <stdint.h> please

> The char constant 'A' is really just a synonym for the ordinary integer value 65 which is the ASCII value

Not always, especially right after you came off a paragraph explaining how different machines have implementation-specific behaviors

> The compiler can do whatever it wants in overflow situations -- typically the high order bits just vanish.

This is a good time to explain what undefined behavior actually means

> The // comment form is so handy that many C compilers now also support it, although it is not technically part of the C language.

Part of the language since C99

> C does not have a distinct boolean type

_Bool since C99

> Relying on the difference between the pre and post variations of these operators is a classic area of C programmer ego showmanship.

I'm fine with you mentioning that this can be tricky, but this is more opinion than I am comfortable with in an introductory text

> The value 0 is false, anything else is true. The operators evaluate left to right and stop as soon as the truth or falsity of the expression can be deduced. (Such operators are called "short circuiting") In ANSI C, these are furthermore guaranteed to use 1 to represent true, and not just some random non-zero bit pattern.

Under the assumption that there are no boolean types from earlier, this is not true

> The do-while is an unpopular area of the language, most everyone tries to use the straight while if at all possible.

I would argue that people use do-while more than they need to

> I generally stick the * on the left with the type.

Not a problem, but :(

> The & operator is one of the ways that pointers are set to point to things. The & operator computes a pointer to the argument to its right. The argument can be any variable which takes up space in the stack or heap

And constants/globals

> To avoid buffer overflow attacks, production code should check the size of the data first, to make sure it fits in the destination string. See the strlcpy() function in Appendix A.

strlcpy is non-standard and probably not what you want

> The programmer is allowed to cast any pointer type to any other pointer type like this to change the code the compiler generates.

> p = (int * ) ( ((char * )p) + 12); // [Some spaces added by me to prevent Hacker News from eating the formatting]

Only in some very specific cases…

> Because the block pointer returned by malloc() is a void* (i.e. it makes no claim about the type of its pointee), a cast will probably be required when storing the void* pointer into a regular typed pointer.

Casting malloc is never required (and I would say usually not a good thing to do)

>> The value 0 is false, anything else is true. The operators evaluate left to right and stop as soon as the truth or falsity of the expression can be deduced. (Such operators are called "short circuiting") In ANSI C, these are furthermore guaranteed to use 1 to represent true, and not just some random non-zero bit pattern.

> Under the assumption that there are no boolean types from earlier, this is not true

Actually, I believe _Bool is guaranteed to use 0 for false, and any non-0 value is stored as 1 for true. Arithmetic on _Bool is also guaranteed, based on those values.

For example, I believe the standard guarantees:

    _Bool x = 255;
    assert(x == 1);

    size_t y = 10 + x;
    assert(y == 11);
This is exactly why I included earlier bit where they claimed there was no boolean type, to show that their conclusion is logically inconsistent rather than just incomplete ;)
>> I generally stick the * on the left with the type.

> Not a problem, but :(

In a declaration, * is a type modifier. E.g. `int a;` declares a variable of type "int" named "a". `int* a;` declares a variable of type "pointer to int" named "a".

The only time this doesn't work is if you stick multiple declarations on the same line. That's annoying to me, because it means you're breaking the "declare variables as close to their first use as possible" practice. It's not K&R C any more, you don't need to declare everything all at once at the top of the scope.

Also, to prove that `` in a declaration is part of the type, note that K&R style function declarations (no argument names, just types) are still valid C (though I'd strongly discourage their use). So `void func(int a, int b);` is identical to `void func(int, int);`. It's very different from `void func(int, int);` that you'd get if you assume the `` goes with the `a`.

It took Microsoft 16 years to support most of C99 in MSVC, and they are still not completely done after 21 years. I think for a document last updated in 2003 it's ok not being based on C99 ;)
> Under the assumption that there are no boolean types from earlier, this is not true

Can you elaborate on this one? I thought && and || expressions always evaluated to 0 or 1.

C99 added type _Bool (also called bool if you #include <stdbool.h>) -- but it doesn't actually use it much. Operators that yield logically "boolean" values (<, <=, >, >=, ==, !=, &&, ||, !) yield values of type int, not _Bool. The value is always 0 or 1 -- in contrast with isdigit(), for example, which is specified to return 0 for false and some unspecified non-zero value for true.

Converting any scalar value (that includes pointers) to _Bool yields 0 or 1 (false or true).

Now that I think about it, I think that depends on what you mean by "use". I was commenting from a perspective that you can pass in something that is not 0 or 1, and in general it is not advised to assume that a "boolean" is 0 or 1 especially given that this document doesn't mention the boolean type (which is guaranteed to have those values). This is true even under ANSI C, because as it mentions later, programmers depend on any nonzero value being "truthy".
Normal c is not difficult. But once get yourselves into using it in eg the hacking of game boy (still remember the confusion of the data, tile etc) and basic bare metal small machine. Just hard.
I don't want to learn C but certain types of applications there's really no practical alternative.
I typically see Rust marketed as a better replacement for C. In what cases isn't it a practical replacement?
Rust is a fine replacement for C++, but not really for C, and the reasons why Rust can't be a replacement for C are very similar to why C++ can't be a replacement for C. Everybody who chooses C today has slightly different reasons to do so, but one important reason is that C is a very small language with a very small standard library, and most parts of the standard library can be ignored without losing any of the "qualities" of the C language. IME a small language feature set makes it easier to evaluate and integrate third party code, it's hard to say exactly why that is, but that's my experience anyway. Of course one could say "nobody forces you to use the whole feature set of Rust", but the same has been said in the C++ world for decades. The problem is that everybody selects their own subset of the language when having the choice.
Core Rust is very small language. It doesn't have memory allocator, so it can be used in embedded, where heap is not available. C++ cannot do that.

https://doc.rust-lang.org/core/index.html

Its a replacement for c++ not c
When you’re targeting platforms that Rust doesn’t, have a lot of legacy code that can’t do for a new toolchain, or cannot handle dependencies.
I learned from Sam's C Primer Plus.

https://www.amazon.com/Primer-Plus-5th-Stephen-Prata/dp/0672...

My version was older than the Amazon version as I learned in 1987.

This was the PDF used in my UNIX class! That brings back some memories...
A very great resource!
I also liked his "Essential Perl" back in the day: http://cslibrary.stanford.edu/108/EssentialPerl.pdf
This is a pretty neat guide if you’re cheap and have moral qualms about pirating K&R. Still, I think the best introduction to C remains K&R.
K&R is woefully out of date. Gives you no info on how to do things safely and sanely. And encourages a leet style of programming that results in catastrophic edge case bugs. As you can see in comments above where naive code that iterates backwards through an array fails when the array size is 0. Worse K&R leet style buys you absolutely nothing with a optimizing compiler written in the last 30 years.
K&R is actually fairly reasonable about performance, and much less l33t than code I have seen in the wild. It is a very good introduction about the language, and although I would not ever call it "woefully out of date" I would say that it is a good idea to read more about the current state of bugs and tooling, which is not discussed because the book is a general overview of the language.
Eh, K&R Second Edition is a very useful book also today. The only downside is that stops at C89 and hasn't been updated for C99.
I hate guides showing me how to do things safely and sanely from the get-go. Show me how to do it. If safety and sanity are priorities for me, I’ll seek those out on my own.