Hacker News new | ask | show | jobs
by pkkm 1070 days ago
1. It gives me a lot of control over how the program works, which lets me create programs that work faster and use less memory than would be possible in most other languages.

2. Relatedly, it's more explicit than almost any other language. If a line of code doesn't look like a function call, it's not calling anything. There is no hidden control flow. These statements are not true in languages which support operator overloading or exceptions. The only real competitor to C here is Zig.

3. If I give a Linux user the source of a C program, they can probably compile it with the tools they already have. This will most likely be the case 20 years from now too, as long as I keep my C mostly standard-compliant. I'm not sure that code in newer, faster-moving languages like Rust will stay compilable as long.

4. It's a lingua franca. C libraries can be used from most programming languages without too much effort.

I probably wouldn't start a large project on a tight deadline in C, but I think it's a great language for writing new command-line utilities and for rewriting tricky algorithmic code from scripting languages. I've gotten 100x and even 1000x speedups from replacing a couple of Python functions with C.

The ease of use is about to improve with the C23 standard, which I'm very happy with. On the other hand, some tricky areas like aliasing are likely to stay tricky forever.

8 comments

> It gives me a lot of control over how the program works, which lets me create programs that work faster and use less memory than would be possible in most other languages.

While it is true to a degree, I would also add that due to its low level of expressivity, you often have to introduce less efficient solutions simply because language deficiencies. Things like small string optimizations in C++ are simply not possible in C.

2 is true, but it comes at the expense of bad expressivity, see the former point.

3. Well, will it really compile to what you meant? If you have UB, it might still compile but the semantics of your program could change entirely depending on which compiler and which version you use.

Also, your Python point: well, that’s because you used python in the first place, which is very slow even among scripting languages.

> Things like small string optimizations in C++ are simply not possible in C.

I don't think this is true, I've seen a bunch of libraries implement SSO in C:

https://nullprogram.com/blog/2016/10/07/

https://github.com/stclib/STC/blob/master/include/stc/cstr.h

https://github.com/mystborn/sso_string/blob/master/include/s...

> due to its low level of expressivity, you often have to introduce less efficient solutions simply because language deficiencies. Things like small string optimizations in C++ are simply not possible in C.

You don't have to use an inefficient solution. You can always roll your own optimized solution or use a library. I agree that C++ has some nice string optimizations built into the standard library, but it's not obvious to me that they're always better than the simplicity and predictability of a simple chunk of memory.

Besides, you generally don't write C code in the same way you write C++ but with more primitive tools. You often allocate a buffer once and operate on it; you don't emulate passing strings by value from function to function, doing lots of allocations and deallocations in the process.

> Well, will it really compile to what you meant? If you have UB, it might still compile but the semantics of your program could change entirely depending on which compiler and which version you use.

I'm not sure what point you're making here. If you have bugs in your program then it may work incorrectly, yes, but that's true no matter the language.

> You don't have to use an inefficient solution. You can always roll your own optimized solution or use a library

That’s not true. You for example can’t write a generic, efficient vector implementation in C - the language itself can’t do that. You either have to copy paste the same code for different sizes, or make use of some monstrous hack of a macro. Instead projects use hacks like conventionally placing the next/prev pointer in structs (linux kernel), and the like.

C++ is the de facto language for high performance computing, so I very much question that “you don’t write C as C++ part”, if anything you don’t write C++ as C as that would be inefficient.

Generic things are rarely efficient, the most optimal code tends to be specialized and tailored to specific hardware and/or the kind of data its operating on.

std::vector (which is a really inefficient way of doing dynamic arrays btw) can be cleanly implemented with macros (see stb stretchy buf) or by splitting the element data from the housekeeping data:

  int append(void *arr, size_t elemsize, size_t *capacity, size_t *size, const void *items_to_add, size_t num_items_to_add);
How is std::vector is inefficient?

Especially that that macro-hack from stretchy buf seems to do it in an even more naive way.

Splitting the element data is a different implementation with very different performance characteristics - it’s quite a bad thing if I have to resort to that due to a language inefficiency, especially in case of a language that is supposedly close to the hardware.

There are various constraints on std::vector because of language in the standard which makes concessions for generic use that might not apply to your application. Small vector optimizations aren’t possible in std::vector, also some operations that could be done in-place can’t be. You also give up control of some meta-parameters and allocation strategies that may be more efficient for your use case.
Six arguments - seriously? Avoiding that is the point of generic programming, and probably more efficient there too
You're talking about something that isn't related to efficiency. Copy and pasting, macros, generating code -- none of these preclude producing an efficient solution.

There is nothing in C++ that is inherently more efficient than C.

Except that more efficient solutions can be implemented much more practically? Solutions that you'd need to bend over backwards for in C?
What does that have to do with efficiency? We don't appear to be debating language ergonomics, but the notion that C is somehow inferior to C++ when it comes to performance.
C++ has a lot of compile time programming features that C cannot do practically. There are sometimes alternatives to those mechanisms in C, but they rely on mangling, macros, non-portable tricks, and so on.

On the topic of performance, the best counterargument to C++ from a C perspective would be that hand rolled code generation isn't all that bad in practice. It's just language theorists don't like that approach aesthetically.

> hacks like conventionally placing the next/prev pointer in structs

This is not a hack, it is the way it should be in C.

Except that that's a linked list and not an array.
You can store your 'list items' in an array and still link to random items in the array - although an index instead of a pointer would make more sense in that case, but what else is an index than a pointer with fewer bits ;) The main advantage being that you don't need to alloc/free individual items.
> I very much question that “you don’t write C as C++ part”, if anything you don’t write C++ as C as that would be inefficient.

Yeah, I was thinking about your string example when I wrote that. For high performance numerical code, I can see the advantages in using C++.

People who know how to use C rarely if ever have problems with undefined behavior. I particularly have written a huge amount of C code and my bugs have never been related to undefined behavior. This is an idea that has been spread to make people even more afraid of using C/C++. While there is a possibility of finding these problems, in practice it is almost a non-issue.
TBF, did you ever run your code with UBSAN enabled? There's a couple of UB cases which don't trigger any bugs until one of the popular C compilers changes some details in their optimizer passes, and which then only manifest with a specific combination of compiler options.
> People who know how to use C rarely if ever have problems with

I think we call this "No True Scotsman".

In real life lots of people write C because they want to or have to and they generate tons of bugs from bug classes that just aren't present at all in other languages.

> People who know how to use C rarely if ever have problems with undefined behavior.

I think the CVE database would disagree with that statement.

Same here, no problems with undefined behavior. Also, no memory issues either after done with code finalization using Valgrind.
“no memory issues in the tested state space”. That’s the only thing Valgrind can say. But it says nothing how a run with different input would behave, it just might segfault/leak/use after free/UB.
That is always the case in any platform. Just because something works on a Mac it will not necessarily work on a PC or vice-versa. If a language has multiple compilers, you also need to test in different compilers to make sure your code works there too. You're trying to make this as a C-only issue, when it is a general issue, maybe with different names.
C is an expressive language when you’re not working with strings and memory the way you do in most HLLs. Almost all operators return a value and can be nested in sub-expressions. Assignments and pre/post increments/decrements are expressions. The comma operator evaluates expressions in the given order and returns a value. There is a GNU extension called “statement expressions”, allowing you to define function-like macros.
>If a line of code doesn't look like a function call, it's not calling anything.

In C, if you for example write past the bounds of an array or otherwise do something that causes UB, there is no guarantee that the code you wrote in the source file is actually going to be what's ran.

If an attacker can clobber the stack (for example), the control flow you see in the source code and the actual control flow of the program are not the same.

In the worst case, an attacker can get your program to execute arbitrary code of their own choosing!

Maybe some consider this unrelated to the no implicit control flow thing, but I think when UB caused by a trivial mistake can alter your control flow, you have much bigger worries than an operator being sugar for calling a function.

I consider UB and arbitrary code execution exploits to be a case of implicit control flow!

> These statements are not true in languages which support operator overloading

I guess I will never understand the C and Java developers incredible fear of operator overloading.

Do you have the same reaction to user-defined functions? Because they are exactly the same thing. Is it because of the bad type system that won't let you know what operator you are using?

I guess I will never understand the C and Java developers incredible fear of operator overloading.

The answer is in the sentences right before the one you quoted:

Relatedly, it's more explicit than almost any other language. If a line of code doesn't look like a function call, it's not calling anything. There is no hidden control flow.

Consider the use-cases for C: operating system kernels, hard real-time software, low-level libraries, databases, embedded software. What is a common desire among these? Predictable low-latency and high throughput.

It's much easier to achieve these features if your language does not allow "magic." Implicit allocations, RAII, exceptions, overloaded operators; these are all examples of features which allow a library-writer to inject hidden control flow into your code. This can make it very difficult to analyze why code runs slowly or with unexpected random pauses, not to mention making it much harder to step through in a debugger.

The control flow is the same; you evaluate the parameters, and then evaluate the operator. Just like any other function call, there's nothing implicit or hidden. The only difference is that you can't create other operators with the same name for different types.

And whether something is called or run inline is always decided by the compiler. Modern C doesn't promise you any relation between the way you break down your functions on your code and the actual function calls on the assembly it generates.

So, I keep seeing people complaining about overloading; always with the same reasons; that are patently not valid unless there's some implicit assumption they keep not stating. What is that assumption that breaks the equivalence between user-defined functions and operators?

Just like any other function call, there's nothing implicit or hidden.

The implicit part is the question of whether an operator is built-in or overloaded. In C, every operator is built-in, so you can look at a block of code and see that there are NO function calls in it. With something like C++, you must treat every operator like a function call.

With C, if I write:

    a += b;
I can be VERY confident that this line of code will execute in constant time. With C++ (or other operator-overloaded language), I cannot. I need to know what the types of a and b are, and I need to go look up the += operator to see what it does for these types (and this is not one universal place, it's specific to the type).

Furthermore, this may be the last line within a particular scope. With C I know that nothing else will happen, and that the control flow depends only on the surrounding scope. With C++, I don't know this! There may have been many objects created within this scope and now their destructors are firing and potentially very large trees of objects are being cleaned up and deallocated, and even slow IO operations running.

> With C++ (or other operator-overloaded language), I cannot

All programming requires people to follow reasonable conventions. In C++ if you make a dereference operator with non-constant time, or an equality operator which doesn't follow equality semantics, the programmer messed up. It's like giving a function a misleading name, like `doThis()` and it doesn't.

Note that Java is filled with these kinds of conventions, such as overloading `equals`. How can you be certain it actually obeys equality semantics? You have to trust the programmer.

If I see `x+y` in C, I know 100% that it'll be ~0-1 instructions, O(1), and will have the lowest latency & highest throughput that a thing can have, i.e. basically completely ignorable for figuring out the perf of a piece of code, or determining what complex things it may do (additionally, it'll hint that the operands are pointers or numbers). For `f(x,y)`, none of those may hold. With operator overloading, f(x,y) and x+y have the exact same amount of instantly tellable facts, i.e. none. x+y becomes just another way to do an arbitrary thing.

In C, if I'm searching for how a certain thing may be called from a given function, I only have to look for /\w\(/ and don't have to ever think about anything else.

Honestly, operator overloading isn't really that bad (especially if an IDE can highlight which ones are), but it's still a thing that can affect how one has to go about reading code that might not even use it.

However as a novice I found it unintuitive that on an embedded platform without hardware floats x/y will compile but compiles to a polyfill with quite a few instructions.
That’s the only caveat. With operator overloading, the scope for what happens on a given line of code expands dramatically. Now your entire dependency graph is part of the search space. Heck, the operator might not even terminate at all!
Right, that's definitely quite a strong point against the C operator-function separation. There can be a good argument made for just not providing unavailable operations as operators. But, still, x/y won't touch any of your memory (assuming a non-broken stdlib), so you're still free to skip over it while scanning for a use-after-free or something.
User defined functions require a function call pre- and post- amble to be added to the machine instructions that execute the function behavior. Typically this consists of growing the stack, adjusting required pointers at the top and then undoing that at the end. In C the operators defined by the language implementation do not involve any adjustments to the stack frame and do not invoke a ‘call’ or jump instruction in the assembly. Once operator overloading is possible this difference immediately becomes blurred.
I would say that C macros have inspired the development of concoctions of far greater magical qualities than, say, RAII. C programmers are not immune to violating the principle of least astonishment.
In terms of C functions _typically_ being globally defined, mostly unique identifiers are a good thing in terms of code readability.

Of course, C functions can be passed as variables. Or in a wider scope they might be inline, macros, or ifdef'd to different functions. But those cases are _typically_ recognized as undesirable and avoided.

Java's a bit of a different story, which I can't figure out a good way to explain. It's hard to explain problems in large code bases, as a quick example rarely suffices. I've seen more than one bug caused because foo.bar(qux) called a different method of bar than the original programmer intended (both because foo's bar was overwritten and qux was a different type than expected).

Don't get me wrong, I would use operator overloading in a heartbeat if I was writing code for a math-y CS coding assignment. It's fine for code that will have a lifepsan measured in weeks / months with probably only 2 or 3 people ever looking at it.

Saying what you mean, as clearly and directly as possible, has it's perks in certain applications (large code bases, life critical code bases, code bases that will last for decades with dozens of programmers). Otherwise stated, cases where code is going to be read many times more than written.

To answer your question more directly: User definable functions aren't a problem. Re-definable functions are!

> If a line of code doesn't look like a function call, it's not calling anything.

Why is that important to you?

Some of the worst bugs I have experienced are ones where code is executing without it being clear where it is executing. The front end stack is the most awful about this, where at any given moment all kinds of things might happen without notice. A clear, sequential program can be stepped through and understood.
It's important for reading other people's code. When I see a function call then I know that "anything" can happen inside that function, so I better investigate. For anything that's not a function call it is obvious what happens under the hood.

In languages like C++ I potentially need to check every operator if it is overloaded, and find the place where that happens (I think I haven't seen any IDE support to help with 'resolving' overloaded operators, but maybe that has improved in the meantime).

I did not touch C for 20y but its make code easy to read and understand. And helps debugging and error searching more easy task.
> If a line of code doesn't look like a function call, it's not calling anything.

except maybe allocating dynamic arrays, floating point ops if those don't exist in hardware. Then you have signal handlers that can be called on math errors, segmentation faults, ... . So basically every line in your code can implicitly call a function.

> If I give a Linux user the source of a C program, they can probably compile it with the tools they already have.

What is win32.h and why is it missing?

> This will most likely be the case 20 years from now too

What is Xlib.h and what do you mean I have to rewrite the apps front end from scratch?

Thank you for the input, one point that stood out for me was that you prefer to write command-line utilities with C.

Why is that? I use scripting languages mostly in my day work (Ruby and some Python bc AI) and have found my productivity using command-line utilities is amazing with Ruby. Do you do it bc of performance, ease of use bc you are proficient, a mix of both or something else?

Oh, I don't use C for every command-line utility. If it's something IO-bound like parsing a webpage and downloading a bunch of files linked from it, I write it in Python. The convenience of modules like argparse and requests is very hard to beat, and it would take me a lot longer to do it in C.

I reach for C when performance matters, for example when processing multi-GB files or looking for perceptual hashes that are similar. It can be a difference between minutes and hours of running time.

With C23 there's the #embed feature. Might be super useful for embedded software. How are the toolkits e.g. for ESP32 and TI in terms of C23 compatibility?
> How are the toolkits e.g. for ESP32 and TI in terms of C23 compatibility?

C23 hasn't been released yet so it's hard to talk about compatibility. It's probably going to take a few years before it's widely supported.

> The only real competitor to C here is Zig.

Why only Zig?

It's the only language I'm aware of that takes C's explicitness and pushes it even further: it bans some implicit conversions, and it makes you pass an allocator as an argument to functions which can allocate memory. Most languages choose to go the other way and introduce features like try/catch and operator overloading.