Hacker News new | ask | show | jobs
by wvoq 5469 days ago
I don't intend to argue so much as to offer a data point: I suspect that most folks on this site learned C or something C-like early on, and have internalized its modus operandi. I have been learning C recently from a background of functional programming, and I find it scary and evil.

FWIW, I enjoy programming very much, and have built some non-trivial stuff in several languages. Still, I found C extremely taxing. Not in the sense that I felt it was beyond me, but in that I was fighting or recoiling from the language at practically every turn. In what follows, I am acutely aware that I am nowhere near fully fluent in C yet, and am writing only to offer the first impressions of a student. Nevertheless, in that time I have been able to draw on the advice of several "experienced colleagues", as K&R urge. In that sense, if I misstate the facts, I will be repeating the misconceptions of people who have objectively spent an awful lot of time writing C. I would be glad to be corrected on any point, but I'd also find that kind of symptomatic of the issues I have with C.

Firstly, C is incredibly stateful. I would be glad to learn that I am just Doing It Wrong, but there seem to be no obvious way around stateful manipulations as a way of life. Take, for instance, the fact that arrays are essentially second-class citizens and must be stuffed into functions as (pointers to) extra parameters in order to capture the "function's" "output". I am breaking out the scare quotes here since it is an abuse of vocabulary to refer to a procedure that communicates with the world by modifying its inputs as a function. If you have not cut your teeth on it, it seems almost obscurantist. If I want to multiply A by x and store the result in b, I want to write

b = matrixProduct(A,x);

not

matrixProduct(b, A, x);

as I must in C. I go back and forth over whether this is a deliberate and worthwhile performance tradeoff or just myopic design[1], but I don't want to have to settle for this in code I read every day.

There is also a kind of bureaucratic spirit in much C code, resulting from the fact that one must attend so closely to the how of computation rather than the what. In some contexts, like when you're doing distributed simulations that may run for days and performance is at an absolute premium, this emphasis may be appropriate. In most contexts, however, one finds oneself implementing and reimplementing standard operations by hand. Why should it take four lines to sum an array?

One of the most alarming symptoms of this style can be witnessed by watching an experienced C programmer read code. Old hands don't read lines, they scan an entire section of a page at a time. I was awed by this ability until I realized that it is possible only because each line of C does so little. I recall someone (probably in an HN comment) describing the "rhythm" of reading C code. That, to me, is not an encouraging sign.

Then there is, of course, the penance of debugging segfaults and space leaks. What happens when you declare 10 pointers and allocate 9 of them? No one knows, because C doesn't know either. It's up to the compiler. Towers of Hanoi, nasal demons, &c. Even with the help of with smart and experienced people, I've never spent so much time diagnosing such trivial, silent runtime errors. Yes, I know it's that way for a reason, but that reason is not legibility or ease of understanding.

At end, C has performance and a relatively (but not exceptionally) compact semantics. The importance of cycle-squeezing is becoming less of a consideration daily, for reasons well-rehearsed on this site and elsewhere. As for C's semantics, I'd much rather spend my time thinking about the transformations I want to map over my data than orchestrating von Neumann machines to carry those transformations out. If your model of computation is something other than register machines, it's not quite cricket to describe the implementation as magic.

-----------------------------

[1]I'd like to qualify that remark in two ways. Firstly, I certainly recognize the brilliance of K&R for working C from the raw conceptual materials of the time. Secondly language designers in 1969 did not have the benefit of the last 40 years' of object lessons in readability. There are many potential languages--points in language space, if you will--that are semantically identical or near-identical but much more readable than the ansi standard. For an example given by Kernighan himself, postfixing the dereference operator would have done miracles for legibility. Ultimately, a lot of more or less arbitrary choices were made early on and now it's too late to correct them.

5 comments

C is not the best way to manipulate information, as you have noticed. It's abstractions are all wrong. On the other hand, C is great for computer engineering, as opposed to computer science.

If you are writing firmware, for example, the exact sequence of memory writes is critical. Program the hardware registers in the wrong order, and the device doesn't work. Access the FIFO the wrong way, and your ISR has a data race. Hiding memory access from the programmer is useless when accessing memory correctly is the problem. C is pretty much the only usable language for this kind of work.

The guys writing kernels and system-level libraries face similar issues.

So, C is stateful because the hardware is stateful. C has raw pointers because the hardware has raw pointers. C doesn't manage memory for you because you don't want C to manage memory for you. This all makes sense when you realize that C was invented for writing operating systems.

So, while I do understand your criticism, I think you are looking at this from the wrong angle. The electrical engineering guys build the hardware, and the C guys make the hardware boot up. Fancy functional programming languages are useless without real machines to run on, and C makes those machines go. It's part of the plumbing, just like transistors. Plumbing may be messy and unpleasant, but even the architects designing skyscrapers need to know how it works.

I fully acknowledge that there are domains where C is the best/only tool for the job. I was specifically taking issue with the claim that C is lucid, and neither scary nor evil.
C is only opaque, scary, and evil if you consider computer hardware itself to be opaque, scary, and evil. For someone like an electrical engineer, C is perfectly lucid, safe, and friendly.
Late to that party, but I'll give my $0.02 regardless.

I won't defend C: it's all that you say. I think one of the acknowledged reasons for C's longevity is that it is impedance-matched for UNIX because they grew up together, and for various reasons UNIX is popular with hackers; therefore C is popular. C is deeply rooted within UNIX, partly because the ABI has been so stable for so long. If you want to write a library for UNIX, you generally target C because anything else will run into a quagmire of cross-compiler incompatibility issues. That means all the good libraries on UNIX are written in C or present a C ABI/API. The easiest language to use a C library from is... C. Or C++. So application writers (and tool writers) have often favoured C as well, although the rise of interpreted languages such as Python, Perl, and Ruby has changed that a bit. Also, the C/C++ toolchains have tended to be more advanced than those of other languages.

So, C is still with us, but for reasons that don't have as much to do with its merits as a language as with its merits as a platform (when coupled with UNIX).

If you're thinking of arrays as second class citizens, you're thinking of the language incorrectly. Rather you should realize there are no arrays, only pointers to chunks of memory and arithmetic operations.
There are arrays. C defines types. When you declare a variable as an array of int, C knows that variable is of type "array of int" and treats that differently than if it had been declared as "array of pointers to int" or "array of char" or "pointer to array of pointers to pointers to int".

I spent many years believing it when people made exactly the assertion you have(see my other posts on this article), but it wasn't until I tried to build a C compiler myself how wrong it is to think of arrays this way. Yes, C gives you the power to reference memory in a more or less arbitrary way. That does not mean that the arrays you declare are not arrays.

You're wrong; there ARE arrays: make int a[16], *aa; and compare sizeof(a) with sizeof(aa). The confusion arises from the fact that the VALUE of an array is a pointer to its first element. When you consider that C is a pure pass-by-value language, everything fits nicely into place.
sizeof behavior is just icing over what is really going on. You might as well argue that arrays really exist because we have the [] operator.
No not really. What's really going on is that A has been declared as an array of 16 ints, while aa has been declared as a pointer to int. Those are different types, and C treats them differently sometimes (but not all times).
Aside from the additional information that the C compiler is capable of knowing about in the case of A, it is the same crap going on under the hood.
You're not writing in under-the-hood, though, you are writing in C. If you want to properly understand compiler warnings and errors you need to know that arrays and pointers are separate types.

Yes, "under the hood" it might be the same, but that's true of many languages if you dig deep enough. C is just a little closer. Yes, at one level a char declaration is just a smaller minimum memory allocation than int, but C will check both those types and if you want to use C properly you'll want to understand its typing and casting rules.

Well, QED.
Programming boils down to paying attention to details. C is an extremely small, simple, predictable and malleable [] language that makes you painfully aware of that fact. For me, it is a pleasure to read well-written C code, for example the NetBSD kernel. There's a lot to learn there about elegance and good design.

[] malleable in the sense of bottom-up programming and defining your own "vocabularies". When I develop in C (and C++) I usually solve the problems 75% bottom-up and 25% top-down. With bottom-up approach, it is easy to stay focused at the problem at hand and write mostly bug-free code. The "top-down" bits just put pieces of the puzzle together.

It is possible to program C with a functional mindset, but the syntax does get in the way. The function syntax does not distinguish between parameters that are modified but you can by convention, and use of structs to bundle stuff up. You can pass function pointers liberally. In the end though the number of use cases for C is smaller now, and you should be able to avoid it for large projects and only use it for small pieces.
Actually, C89 can distinguish between parameters that are modifiable and those that are not. The syntax leaves a bit to be desired ("const foo_t x", "foo_t const x" and "const foo_t *const x" all have different semantics) but it can be done.