Hacker News new | ask | show | jobs
by babarock 1069 days ago
People who still write C, honest question: Why?

C is full of quirks. From cryptic "undefined behaviors" to a type system that isn't really a type system (more like "size hints for the compiler"), the language doesn't feel easy to use/debug. Add to this CPP macros, a universally recognized bad idea, a clunky import system, and lack of a single reference implementation of the compiler/libC, and you have a language that is harsh to defend.

Also, documentation is all over the place. If a function isn't described in `man`, I have no idea where else to actually look for it.

I used to think "C presents the most honest representation of the low-level mechanisms of the computer", but... even this is shaky. I've been programming for almost 15 years now, and I don't think I've ever seen a computer where memory is actually a continuous array of bits sorted by memory address. The C representation of memory (and all the pointer arithmetic) is not a real representation of your hardware, and this too is an abstraction.

So, setting aside the need to maintain 30+ year old code, what would be modern reasons to start a new project in C?

46 comments

1. It gives me a lot of control over how the program works, which lets me create programs that work faster and use less memory than would be possible in most other languages.

2. Relatedly, it's more explicit than almost any other language. If a line of code doesn't look like a function call, it's not calling anything. There is no hidden control flow. These statements are not true in languages which support operator overloading or exceptions. The only real competitor to C here is Zig.

3. If I give a Linux user the source of a C program, they can probably compile it with the tools they already have. This will most likely be the case 20 years from now too, as long as I keep my C mostly standard-compliant. I'm not sure that code in newer, faster-moving languages like Rust will stay compilable as long.

4. It's a lingua franca. C libraries can be used from most programming languages without too much effort.

I probably wouldn't start a large project on a tight deadline in C, but I think it's a great language for writing new command-line utilities and for rewriting tricky algorithmic code from scripting languages. I've gotten 100x and even 1000x speedups from replacing a couple of Python functions with C.

The ease of use is about to improve with the C23 standard, which I'm very happy with. On the other hand, some tricky areas like aliasing are likely to stay tricky forever.

> It gives me a lot of control over how the program works, which lets me create programs that work faster and use less memory than would be possible in most other languages.

While it is true to a degree, I would also add that due to its low level of expressivity, you often have to introduce less efficient solutions simply because language deficiencies. Things like small string optimizations in C++ are simply not possible in C.

2 is true, but it comes at the expense of bad expressivity, see the former point.

3. Well, will it really compile to what you meant? If you have UB, it might still compile but the semantics of your program could change entirely depending on which compiler and which version you use.

Also, your Python point: well, that’s because you used python in the first place, which is very slow even among scripting languages.

> Things like small string optimizations in C++ are simply not possible in C.

I don't think this is true, I've seen a bunch of libraries implement SSO in C:

https://nullprogram.com/blog/2016/10/07/

https://github.com/stclib/STC/blob/master/include/stc/cstr.h

https://github.com/mystborn/sso_string/blob/master/include/s...

> due to its low level of expressivity, you often have to introduce less efficient solutions simply because language deficiencies. Things like small string optimizations in C++ are simply not possible in C.

You don't have to use an inefficient solution. You can always roll your own optimized solution or use a library. I agree that C++ has some nice string optimizations built into the standard library, but it's not obvious to me that they're always better than the simplicity and predictability of a simple chunk of memory.

Besides, you generally don't write C code in the same way you write C++ but with more primitive tools. You often allocate a buffer once and operate on it; you don't emulate passing strings by value from function to function, doing lots of allocations and deallocations in the process.

> Well, will it really compile to what you meant? If you have UB, it might still compile but the semantics of your program could change entirely depending on which compiler and which version you use.

I'm not sure what point you're making here. If you have bugs in your program then it may work incorrectly, yes, but that's true no matter the language.

> You don't have to use an inefficient solution. You can always roll your own optimized solution or use a library

That’s not true. You for example can’t write a generic, efficient vector implementation in C - the language itself can’t do that. You either have to copy paste the same code for different sizes, or make use of some monstrous hack of a macro. Instead projects use hacks like conventionally placing the next/prev pointer in structs (linux kernel), and the like.

C++ is the de facto language for high performance computing, so I very much question that “you don’t write C as C++ part”, if anything you don’t write C++ as C as that would be inefficient.

Generic things are rarely efficient, the most optimal code tends to be specialized and tailored to specific hardware and/or the kind of data its operating on.

std::vector (which is a really inefficient way of doing dynamic arrays btw) can be cleanly implemented with macros (see stb stretchy buf) or by splitting the element data from the housekeeping data:

  int append(void *arr, size_t elemsize, size_t *capacity, size_t *size, const void *items_to_add, size_t num_items_to_add);
How is std::vector is inefficient?

Especially that that macro-hack from stretchy buf seems to do it in an even more naive way.

Splitting the element data is a different implementation with very different performance characteristics - it’s quite a bad thing if I have to resort to that due to a language inefficiency, especially in case of a language that is supposedly close to the hardware.

Six arguments - seriously? Avoiding that is the point of generic programming, and probably more efficient there too
You're talking about something that isn't related to efficiency. Copy and pasting, macros, generating code -- none of these preclude producing an efficient solution.

There is nothing in C++ that is inherently more efficient than C.

Except that more efficient solutions can be implemented much more practically? Solutions that you'd need to bend over backwards for in C?
C++ has a lot of compile time programming features that C cannot do practically. There are sometimes alternatives to those mechanisms in C, but they rely on mangling, macros, non-portable tricks, and so on.

On the topic of performance, the best counterargument to C++ from a C perspective would be that hand rolled code generation isn't all that bad in practice. It's just language theorists don't like that approach aesthetically.

> hacks like conventionally placing the next/prev pointer in structs

This is not a hack, it is the way it should be in C.

Except that that's a linked list and not an array.
> I very much question that “you don’t write C as C++ part”, if anything you don’t write C++ as C as that would be inefficient.

Yeah, I was thinking about your string example when I wrote that. For high performance numerical code, I can see the advantages in using C++.

People who know how to use C rarely if ever have problems with undefined behavior. I particularly have written a huge amount of C code and my bugs have never been related to undefined behavior. This is an idea that has been spread to make people even more afraid of using C/C++. While there is a possibility of finding these problems, in practice it is almost a non-issue.
TBF, did you ever run your code with UBSAN enabled? There's a couple of UB cases which don't trigger any bugs until one of the popular C compilers changes some details in their optimizer passes, and which then only manifest with a specific combination of compiler options.
> People who know how to use C rarely if ever have problems with

I think we call this "No True Scotsman".

In real life lots of people write C because they want to or have to and they generate tons of bugs from bug classes that just aren't present at all in other languages.

> People who know how to use C rarely if ever have problems with undefined behavior.

I think the CVE database would disagree with that statement.

Same here, no problems with undefined behavior. Also, no memory issues either after done with code finalization using Valgrind.
“no memory issues in the tested state space”. That’s the only thing Valgrind can say. But it says nothing how a run with different input would behave, it just might segfault/leak/use after free/UB.
That is always the case in any platform. Just because something works on a Mac it will not necessarily work on a PC or vice-versa. If a language has multiple compilers, you also need to test in different compilers to make sure your code works there too. You're trying to make this as a C-only issue, when it is a general issue, maybe with different names.
C is an expressive language when you’re not working with strings and memory the way you do in most HLLs. Almost all operators return a value and can be nested in sub-expressions. Assignments and pre/post increments/decrements are expressions. The comma operator evaluates expressions in the given order and returns a value. There is a GNU extension called “statement expressions”, allowing you to define function-like macros.
>If a line of code doesn't look like a function call, it's not calling anything.

In C, if you for example write past the bounds of an array or otherwise do something that causes UB, there is no guarantee that the code you wrote in the source file is actually going to be what's ran.

If an attacker can clobber the stack (for example), the control flow you see in the source code and the actual control flow of the program are not the same.

In the worst case, an attacker can get your program to execute arbitrary code of their own choosing!

Maybe some consider this unrelated to the no implicit control flow thing, but I think when UB caused by a trivial mistake can alter your control flow, you have much bigger worries than an operator being sugar for calling a function.

I consider UB and arbitrary code execution exploits to be a case of implicit control flow!

> These statements are not true in languages which support operator overloading

I guess I will never understand the C and Java developers incredible fear of operator overloading.

Do you have the same reaction to user-defined functions? Because they are exactly the same thing. Is it because of the bad type system that won't let you know what operator you are using?

I guess I will never understand the C and Java developers incredible fear of operator overloading.

The answer is in the sentences right before the one you quoted:

Relatedly, it's more explicit than almost any other language. If a line of code doesn't look like a function call, it's not calling anything. There is no hidden control flow.

Consider the use-cases for C: operating system kernels, hard real-time software, low-level libraries, databases, embedded software. What is a common desire among these? Predictable low-latency and high throughput.

It's much easier to achieve these features if your language does not allow "magic." Implicit allocations, RAII, exceptions, overloaded operators; these are all examples of features which allow a library-writer to inject hidden control flow into your code. This can make it very difficult to analyze why code runs slowly or with unexpected random pauses, not to mention making it much harder to step through in a debugger.

The control flow is the same; you evaluate the parameters, and then evaluate the operator. Just like any other function call, there's nothing implicit or hidden. The only difference is that you can't create other operators with the same name for different types.

And whether something is called or run inline is always decided by the compiler. Modern C doesn't promise you any relation between the way you break down your functions on your code and the actual function calls on the assembly it generates.

So, I keep seeing people complaining about overloading; always with the same reasons; that are patently not valid unless there's some implicit assumption they keep not stating. What is that assumption that breaks the equivalence between user-defined functions and operators?

Just like any other function call, there's nothing implicit or hidden.

The implicit part is the question of whether an operator is built-in or overloaded. In C, every operator is built-in, so you can look at a block of code and see that there are NO function calls in it. With something like C++, you must treat every operator like a function call.

With C, if I write:

    a += b;
I can be VERY confident that this line of code will execute in constant time. With C++ (or other operator-overloaded language), I cannot. I need to know what the types of a and b are, and I need to go look up the += operator to see what it does for these types (and this is not one universal place, it's specific to the type).

Furthermore, this may be the last line within a particular scope. With C I know that nothing else will happen, and that the control flow depends only on the surrounding scope. With C++, I don't know this! There may have been many objects created within this scope and now their destructors are firing and potentially very large trees of objects are being cleaned up and deallocated, and even slow IO operations running.

> With C++ (or other operator-overloaded language), I cannot

All programming requires people to follow reasonable conventions. In C++ if you make a dereference operator with non-constant time, or an equality operator which doesn't follow equality semantics, the programmer messed up. It's like giving a function a misleading name, like `doThis()` and it doesn't.

Note that Java is filled with these kinds of conventions, such as overloading `equals`. How can you be certain it actually obeys equality semantics? You have to trust the programmer.

If I see `x+y` in C, I know 100% that it'll be ~0-1 instructions, O(1), and will have the lowest latency & highest throughput that a thing can have, i.e. basically completely ignorable for figuring out the perf of a piece of code, or determining what complex things it may do (additionally, it'll hint that the operands are pointers or numbers). For `f(x,y)`, none of those may hold. With operator overloading, f(x,y) and x+y have the exact same amount of instantly tellable facts, i.e. none. x+y becomes just another way to do an arbitrary thing.

In C, if I'm searching for how a certain thing may be called from a given function, I only have to look for /\w\(/ and don't have to ever think about anything else.

Honestly, operator overloading isn't really that bad (especially if an IDE can highlight which ones are), but it's still a thing that can affect how one has to go about reading code that might not even use it.

However as a novice I found it unintuitive that on an embedded platform without hardware floats x/y will compile but compiles to a polyfill with quite a few instructions.
User defined functions require a function call pre- and post- amble to be added to the machine instructions that execute the function behavior. Typically this consists of growing the stack, adjusting required pointers at the top and then undoing that at the end. In C the operators defined by the language implementation do not involve any adjustments to the stack frame and do not invoke a ‘call’ or jump instruction in the assembly. Once operator overloading is possible this difference immediately becomes blurred.
I would say that C macros have inspired the development of concoctions of far greater magical qualities than, say, RAII. C programmers are not immune to violating the principle of least astonishment.
In terms of C functions _typically_ being globally defined, mostly unique identifiers are a good thing in terms of code readability.

Of course, C functions can be passed as variables. Or in a wider scope they might be inline, macros, or ifdef'd to different functions. But those cases are _typically_ recognized as undesirable and avoided.

Java's a bit of a different story, which I can't figure out a good way to explain. It's hard to explain problems in large code bases, as a quick example rarely suffices. I've seen more than one bug caused because foo.bar(qux) called a different method of bar than the original programmer intended (both because foo's bar was overwritten and qux was a different type than expected).

Don't get me wrong, I would use operator overloading in a heartbeat if I was writing code for a math-y CS coding assignment. It's fine for code that will have a lifepsan measured in weeks / months with probably only 2 or 3 people ever looking at it.

Saying what you mean, as clearly and directly as possible, has it's perks in certain applications (large code bases, life critical code bases, code bases that will last for decades with dozens of programmers). Otherwise stated, cases where code is going to be read many times more than written.

To answer your question more directly: User definable functions aren't a problem. Re-definable functions are!

> If a line of code doesn't look like a function call, it's not calling anything.

Why is that important to you?

Some of the worst bugs I have experienced are ones where code is executing without it being clear where it is executing. The front end stack is the most awful about this, where at any given moment all kinds of things might happen without notice. A clear, sequential program can be stepped through and understood.
It's important for reading other people's code. When I see a function call then I know that "anything" can happen inside that function, so I better investigate. For anything that's not a function call it is obvious what happens under the hood.

In languages like C++ I potentially need to check every operator if it is overloaded, and find the place where that happens (I think I haven't seen any IDE support to help with 'resolving' overloaded operators, but maybe that has improved in the meantime).

I did not touch C for 20y but its make code easy to read and understand. And helps debugging and error searching more easy task.
> If a line of code doesn't look like a function call, it's not calling anything.

except maybe allocating dynamic arrays, floating point ops if those don't exist in hardware. Then you have signal handlers that can be called on math errors, segmentation faults, ... . So basically every line in your code can implicitly call a function.

> If I give a Linux user the source of a C program, they can probably compile it with the tools they already have.

What is win32.h and why is it missing?

> This will most likely be the case 20 years from now too

What is Xlib.h and what do you mean I have to rewrite the apps front end from scratch?

Thank you for the input, one point that stood out for me was that you prefer to write command-line utilities with C.

Why is that? I use scripting languages mostly in my day work (Ruby and some Python bc AI) and have found my productivity using command-line utilities is amazing with Ruby. Do you do it bc of performance, ease of use bc you are proficient, a mix of both or something else?

Oh, I don't use C for every command-line utility. If it's something IO-bound like parsing a webpage and downloading a bunch of files linked from it, I write it in Python. The convenience of modules like argparse and requests is very hard to beat, and it would take me a lot longer to do it in C.

I reach for C when performance matters, for example when processing multi-GB files or looking for perceptual hashes that are similar. It can be a difference between minutes and hours of running time.

With C23 there's the #embed feature. Might be super useful for embedded software. How are the toolkits e.g. for ESP32 and TI in terms of C23 compatibility?
> How are the toolkits e.g. for ESP32 and TI in terms of C23 compatibility?

C23 hasn't been released yet so it's hard to talk about compatibility. It's probably going to take a few years before it's widely supported.

> The only real competitor to C here is Zig.

Why only Zig?

It's the only language I'm aware of that takes C's explicitness and pushes it even further: it bans some implicit conversions, and it makes you pass an allocator as an argument to functions which can allocate memory. Most languages choose to go the other way and introduce features like try/catch and operator overloading.
> C is full of quirks. From cryptic "undefined behaviors" to a type system that isn't really a type system (more like "size hints for the compiler"), the language doesn't feel easy to use/debug.

I guess because I just don't agree with this viewpoint at all. I've been writing C on and off for over 20 years now and I simply haven't encountered the amount of distress and pain that I see others deal with, especially when related to memory handling or undefined behavior.

I wrote a piece of software in Win32 C for a gas integration company many years ago that did tons of string manipulation to recalculate reports coming out of another piece of software. It even included a custom built on-disk database which basically ended up being my own version of BDB. Scratch that, I wrote this software twice because my first version was lost in a disk crash and I had to hex dump the database format to recover my original implementation.

Last I recall that software ran at that company for over a decade and probably helped them make millions in revenue. I didn't have a single support ticket and to be honest the last time I talked to the owner I thought they had just stopped using it. I was very surprised that they were still very happy with it and it was working fine.

That's just one of many examples of projects I've built or debugged in C. I've regularly been able to fix issues in OS drivers, large projects like Asterisk, and things like deadlocks in toolkit-based GUI programs. It's actually easier for me to use C than most other programming languages because it's clearer to me what should be happening, especially when dealing with anything systems-related.

That's just my experience. I totally get that others don't share that same experience but to be honest I'm pretty tired of seeing all of the confused hatred for C.

Adding that anything I wrote in C++/MFC at that time is now obsolete.

Everything I wrote in C/Win32 is as much fresh as it had been 30 years back.

While I undertstand the sentiment, MFC is still being maintained, and is in fact still the only C++ GUI framework worth using, being shipped in Visual Studio latest (2022).
Well... I'm in embedded systems. In embedded systems, you almost never change compilers. You usually don't even upgrade the compiler. Whatever the compiler is for a project, that's what it will be for that project forever. And in your case, it sounds like you only compiled that code with one compiler.

But as far as UB goes, that's cheating. We're playing on "easy mode". We know what that compiler is going to do, and that's all we need.

"Hard mode" for UB is when you have to worry about what a different, unknown, perhaps not-yet-written compiler is going to do with your code. What is the absolute worst that a compiler could do, within the rules, to your code? You and I don't worry about this, and it doesn't bite us. People writing library code do have to worry about it far more than we do.

So I agree that the concern is overblown. But I think that maybe we miss that it's a real concern, because it doesn't hit us.

Because everything speaks C.

If you write a library in C, it can be easily exposed to a variety of high-level languages and platforms.

You might argue this is more a property of the C ABI than of C itself, but unless the project is large enough that it's worth doing it in C++ or Rust instead, it's still a very reasonable choice.

Also not everything is web. Sure, if you're writing API endpoints in C you're just shooting yourself in the foot, just use Python or Ruby or Go and call it a day. For things like embedded it's often your only reasonable choice.

And since C has become more than just a language and actually a protocol (See https://faultlore.com/blah/c-isnt-a-language/), you sometimes would need to know the inner workings of C even when you write in other programming languages (C++, Rust, Swift, Zig, even Python, etc...)
So we're gonna be stuck writing a precambrian prototype language till the end of time because there's so much legacy code already written in it? Never seemed to stop people moving from Pascal, or Perl or literally all other languages that are now obsolete.

I really hate how for microcontrollers the only two choices are either C++ or Micropython, I mean how about some fucking middle ground instead of two polar opposites? At least eventually everything will be rewritten in Rust I guess.

> I really hate how for microcontrollers the only two choices are either C++ or Micropython

Why wouldn't you just use C for programming a microcontroller? Sure, it's not a great language for web backends, but microcontrollers are where it shines. You're probably not deploying 100,000 lines to a microcontroller for a personal project, so the lack of certain abstractions isn't going to be that painful. On the other hand, C lets you make the latency and memory usage 100% predictable, which can be a great asset.

Why wouldn't you use assembly for programming a microcontroller? Sure, it's not a great language for web backends but microcontrollers are where it shines. /s

Because as the OP states, it's an objectively (pun intended) terribly abstracted language. There is nothing 100% predictable about C except that you'll eventually get screwed because you didn't account for some random obscure thing that should never have even been possible to do. Any language that allows using static variables can have predictable memory consumption. There is nothing inherent to it that makes it better than a language that works at the same level but built to modern standards, except the piles upon piles of legacy code you can use.

Enable max warning level, use a static analyzer, and ASAN, UBSAN and TSAN (in order of importance), and most problems you listed just disappear. Most importantly though: don't use MSVC if you have the choice.
Yeah if you want to kill yourself from frustrations, maybe. I'm not writing microcontroller code for the fucking space shuttle, and I would suspect most people aren't.

C did a ton of things right, but it also did a ton of things wrong. Learning from that and moving on would be the sensible thing to do after 50 years.

> There is nothing predictable about C except that you’ll eventually get screwed …

This has been the exact opposite of my experience. I’ve been writing C for 10 years and have yet to find a piece of code where I was surprised at what it did. That’s one thing I love about C, is it is entirely predictable. If it isn’t, my code is wrong. The language is rigorously specified. It is not hard to avoid undefined behavior.

Contrast that with languages like C++ or Python which hide gotchas all over the place. In Python, one cannot even rely on a variable being a certain type, and if it isn’t, the program explodes. C++ allows plus to not be the inverse to minus, allows for hidden custom memory allocators (overloading the new operator). Template metaprogramming is borderline sorcery past the simplest of use cases. C++’s interoperability with C is an accident waiting to happen with all the reallocations which can occur without the user being aware.

C lays flat out in front of the programmer all the unpredictable behavior that many other languages implement behind the programmer’s back. Sometimes that’s not desirable, and sometimes it is.

I agree with your point about Python, which is why I'm glad type hints see adoption but dismayed that they're essentially fancy comments that don't enforce the actual runtime types.

The thing is, I'm not convinced avoiding UB is easy. E.g. what's the behavior of the following code?

    int16_t a = 20000;
    int16_t b = a + a;
> Never seemed to stop people moving from Pascal, or Perl or literally all other languages that are now obsolete.

Operating systems written in Pascal are now obsolete. OSs in C are not.

Perl is much easier to replace because fewer things were dependent on it however even here Perl 5.x still pops up all over the place.

Yeah to my great annoyance I did have to grep for an ipv4 address with a perl regex the other day. But for any actual scripting it's basically dead.
> for any actual scripting it's basically dead.

Run "file /usr/bin/* | grep -i perl | wc -l" on your computer. You will be surprised.

EDIT: if you want a histogram for all the types of programs in your system, run this

    file -bL /usr/bin/* | cut -d' ' -f1-3 | sort | uniq -c | sort
Embedded Rust has been a viable option for at least 4 years now and especially so for the past 2 years. I really dislike having to learn the quirks of building, configuring and navigating typical embedded c based projects. They always seem to have an excessive amount of tiny files (in various languages) all over the place with obscure heuristics only the original authors know about. IMO, to build anything new your only reasonable option is to blindly copy and paste an example project and hack away. I’ve never been able to “start from scratch”.

An embedded Rust project is the same as a normal Rust project except that you mark it as not linking the standard library !#[no_std] and you define a main entry point and panic behaviour (there are helper crates for this).

You can still use the core and alloc crates which give you pretty much everything you need in an embedded system like strings and vectors. You also get to use modern tooling like vs code and rust-analyser instead of a different antiquated version of Eclipse for each hardware vendor.

I don’t think that Rust should only be used for big projects. You can use it for small projects and you really don’t need to get complicated with generics for application code. You need to put in the effort to get a fundamental understanding about what the borrow checker is trying to achieve and the rest may be easier than you think.

While it seems Rust supports ARM devices like M0, M4 and of course more powerful chips like those capable of running Linux, there are huge swathes of chips that it doesn't support like 8051, PIC etc.
> At least eventually everything will be rewritten in Rust I guess.

This is the new "Year of the Linux Desktop".

>I really hate how for microcontrollers the only two choices are either C++ or Micropython

There's TinyGo as well. https://tinygo.org/

I'd say that's the middle ground for me.

It is nice, but nowhere near as complete feature-wise than C/C++. The fact that it exists does not mean you can use it to achieve the same thing.
What do you mean no where near as complete feature wise? Go or specifically the TinyGo implementation?

Seems to do exactly what 99% of people need.

Feature parity is fine but support is not quite there. Doesn't support WiFi on NodeMCU boards last I checked.
"Seems" is an outside perspective. There are loads of hardware features that it just doesn't support on various boards, and lots of extra hardware (like sensors) that it has no libraries for. It's not just the MCU/CPU that matters here.
There's a niche doing C++ (vs. straight C) on microcontrollers but the rest are just tinkerer choices.
> So we're gonna be stuck writing a precambrian prototype language till the end of time because there's so much legacy code already written in it?

Yes. Unless somebody steps up and rewrites everything in Rust or Lisp or whatever, that's exactly what's going to happen. Lack of backwards compatibility with existing software will condemn programming languages to irrelevance on day one.

Isn't Lua middle-ground enough? Alternatively you can write it in V and transpile to C.
Mainframe and micros computers don't speak C, unless we constrain ourselves to their UNIX environment.

ChromeOS doesn't speak C, unless you mean shipping WASM libraries. (Not every Chromebook supports exposing the Linux environment).

iOS and Android, kind of speak C, but not if you care to actually ship an app.

I believe that with the arrival of ChatGPT and similar tools, writing code in C will become as easy as in any other language. The AI tools know how to generate good C code, and C is fast by itself. I believe we'll see a lot more code written in C now that we have new tools to analyze C code.
I have grown somewhat tired of these ChatGPT responses. It's a tool...not a panacea. C is a fantastic, albeit somewhat complicated, language. The problem is a C programmer knows the quirks and ChatGPT will dump you some code that could have undefined behavior depending on the compiler. Will ChatGPT always use restrict correctly (for example)?
Why not? You seem to underestimate the ability of AI tools to understand code. Undefined behavior is something that a good AI tool may avoid without major problems.
The issue to me is not the generation of code. It's that the person using it is inexperienced with the given language. We will never be able to place 100% faith in AI. At least in my lifetime. Given that, I think it's a relative danger that is washed away in all the hype. A junior dev copy-pasting code from chatgpt. I couldn't imagine a more dangerous combination.
Junior dev copy-pasting from stack overflow: this is already happening! Whatever bad thing AI tools can do, this is already reality all over the world.
> writing code in C will become as easy as in any other language.

I look forward to a raft of CVEs over the next decade where ChatGPT is a root cause...

Oh jeez, please don't bring AI into the discussion. AI tools will just repeat all the bad StackOverflow advice and hilariously terrible trial-and-error C code from student assignments.
> The AI tools know how to generate good C code

Are you sure about that? ChatGPT doesn't understand C. It wouldn't even have enough context to reason about UB even if it understood UB.

Microcontrollers exist. Their libraries are written in/for C. The programs running on them are small and need tight, efficient memory management.

I also like the minimalist nature of the language itself. I get that for desktop applications, you usually want more integration with the operating system so you can say "I want a window here and a button here" rather than having to manually build the window from scratch, but that's not something that's a concern in most embedded systems.

I'm operating in a world of voltage inputs and outputs, memory mapped devices, registers, flags, and timings... with almost nothing between me and the hardware. A simple language makes a lot of sense here.

Are the Arduino and ESP32 microcontrollers?

Hint, might check their libraries/SDKs before answering.

Arduino is a platform, not a microcontroller. ESP32 is technically a microcontroller, but it's an SOC... which is not the kind that generally gets used for industrial applications in the field I'm in.

You shouldn't assume I get to choose the platform I'm working on. That's not how it works where I'm at, and if (when) I do get to choose, programming language is unlikely to be near the top of the list of criteria.

Whatever you are forced to chose doesn't make the other options disappear from the market.
The other options aren't relevant to my comment.

If you're going to be pedantic, you need to be both relevant and correct. You are neither.

Don't think I'm too crazy but last time I checked:

1. Yes they are microcontrollers.

2. Yes they use C/C++. (check the libraries/SDKs, 1 layer under the hood it's all .h/.cpp files, and most of the arduino calls are just #defines)

So it isn't only C.
It absolutely is only C on the microcontrollers I'm doing work on.

I don't understand why you're trying to cherry-pick like this.

I wasn't the one making an universal truth out of it.

"Their libraries are written in/for C"

With all respect I think there’s a kind of false dichotomy implicit in your comment.

The availability of new tools with significant advantages over the old tools is almost always a reason to consider the new tools for certain use cases, but the new tools are rarely just strictly better on literally everything, there are generally now use cases when you say “the new tool is a solid fit here” and other cases where you say “the old tool still hits the sweet spot better”.

And that’s before you consider massive existing code and infrastructure and and tooling and investment: which is very, very often a far higher order bit than C vs not-C.

A great example would be a JVM-caliber GC? Thats just such a win over malloc/free so often, but it doesn’t obsolete malloc and free across the board: it gives a thoughtful and mature team a whole new set of options.

Rust would be a (comparatively) recent example of a language that hits a lot of the sweet spots of e.g. C/C++ and brings some cool new stuff to the party, and might even represent a better default these days, but the idea that it strictly crushes them in full-stop everything is a political-style conversation not a reasoned engineering tradeoff conversation.

Even C++ which has been around forever and is give or take backwards compatible with C with good tools? Hasn’t obsoleted C.

More options is generally a good thing (there are exceptions).

> I used to think "C presents the most honest representation of the low-level mechanisms of the computer", but... even this is shaky. I've been programming for almost 15 years now, and I don't think I've ever seen a computer where memory is actually a continuous array of bits sorted by memory address. The C representation of memory (and all the pointer arithmetic) is not a real representation of your hardware, and this too is an abstraction.

It's true that almost nothing works the way it's presented: the computer doesn't necessarily actually do the instructions you specify, it does its machine commands that are compiled. It also doesn't necessarily even do them in the order they are specified. The memory isn't actually a big continuous space, it's mapped as virtual memory. The actual memory isn't used in that way either, there's a hierarchy of NUMAed caches between the CPUs and the actual memory.

But it's a useful abstraction. Partly because a lot of the above things are built so that the abstraction works. But also because we want it to look that way, and it's kinda natural to let programmers imagine a virtual machine that works that way.

More importantly, it's also the abstraction that the CPU itself provides, not C. It'd be neat to be able to control all those things, but that's largely impossible, so I'll take the next best thing.
C presents a fairly honest representation of the low level mechanisms of x86 Assembly. The way Assembly has drifted away from actually CPU instructions is interesting, but not something a programmer will get much benefit from trying to deal with. Itanium was an interesting experiment, but the new set of instructions did not offer large gains in practice.
>>I don't think I've ever seen a computer where memory is actually a continuous array of bits sorted by memory address.

I may be being pedantic or outright wrong (since it's been a while since I used C), but I don't think C can address memory by individual bit.

You have to read one or more bytes from memory, twiddle the bits in them, using C's bitwise operators (like !, &, | and tilde), and then write the changed bytes back to memory at the same addresses you read them from. At least for the earlier C versions I used, this was the case, IIRC.

And to read and write those bytes, you do it via scalar variables like ints or longs, or via structs or arrays, or via pointers. Or using library functions like memset().

Indeed, bytes are the smallest addressable unit, which is 8 bits in most architectures. You can't address a bit, so to do anything with it you have to get the byte it's in and twiddle.
Why do programmers in 2023 need to imagine a virtual machine (basically a PDP-11 from 1970-something) at all?

You only need that abstraction if you're doing low level bit/byte bashing and I/O, or there's some chance you may run out of memory and need to handle that manually.

That applies to a tiny slice of all possible applications.

There are far more useful modern abstractions that don't need to make those assumptions.

> basically a PDP-11 from 1970-something

That PDP-11 from the seventies had ADC/SBC (addition/subtraction with carry) in its instruction set, the result of MUL was twice the size of the inputs (i.e., multiplying two ints produced a long), and DIV produced both the quoitient and the remainder. None of that is visible from C and yet people keep clamoring that "C is close to the metal". Bah, humbug: while " * p++" and " * --p" idioms translate directly into an addressing mode particular for PDP-11 — most other architectures don't have autoincrement/decrements — there is no specific support for " * ++p " or " * p--" in the machine itself.

Yeah that's true, and that's why people don't use C for stuff that isn't close to the metal. If you're just serving some web page you can just think about the business logic and a higher level language will deal with the rest for you.

But someone's got to write drivers and someone's got to write the thing that connects the higher levels to the metal.

Because when you are writing drivers for MCUs, you are writing into arbitrary pieces of memory on arbitrary addresses specified by reference manual for you MCU. And when you will write 0xABCD into memory address 0xF120, then your UART will throw out 0xA, 0xB, 0xC, 0xD on a pin using clocks defined by register 0xF124 which is actually a divider definition from VCO connected to XTAL.

No amount of abstraction under any language will isolate you from such memory model.

Writing C code is fun and enjoyable. C programs are typically fast due to the use of primitives and low overhead. C's set of tools and abstractions typically forces you to think about how best to implement a particular data structure or interface, which is the kind of problem I most enjoy.

>I used to think "C presents the most honest representation of the low-level mechanisms of the computer", but... even this is shaky. I've been programming for almost 15 years now, and I don't think I've ever seen a computer where memory is actually a continuous array of bits sorted by memory address. The C representation of memory (and all the pointer arithmetic) is not a real representation of your hardware, and this too is an abstraction.

Pointers are an abstraction, but they are less abstract than most languages simply assuming there is just one giant sheet of memory to take from.

> cryptic "undefined behaviors"

It's not really that cryptic (aside from like strict aliasing, but -fno-strict-aliasing). There's some UB that might be considered unnecessary/too strict, but it still makes sense in its own right, and, if understood, is quite powerful, and leads to a bunch of neat optimizations.

> the language doesn't feel easy to use/debug

If debugging at the assembly level, stepping by instructions, it's actually quite nice (despite what everyone says about it not mapping well to hardware, in my experience there's still a pretty clear & immediately obvious correspondence between each C thing and assembly subsection, and vice versa)

> CPP macros, a universally recognized bad idea

I don't know, they're quite neat for things I have to do. Sure, a turing-complete compile-time language would be nice (I'm not saying that sarcastically, I even use a DSL for writing SIMD that is exactly that!), but it'd add a ton of complexity to mapping C source to assembly.

> Also, documentation is all over the place. If a function isn't described in `man`, I have no idea where else to actually look for it.

Use of the standard library grows less and less significant as the size of the C project grows. Besides that, cppreference.com has pretty much everything.

And yeah, as others have said, a linear sequence of bytes is still a thing every CPU presents. Yes, there's cache & whatnot, but there's like precisely no way to usefully map that to any user-controllable/visible thing, because it's pretty much not user-controllable and intended to be invisible (and varies across all hardware).

> Sure, a turing-complete compile-time language would be nice

I wrote Metalang99 [1] as a compile-time language that is able to perform loops, recursion, etc. It's not Turing-complete though, as the C preprocessor is not Turing-complete.

[1] https://github.com/Hirrolot/metalang99

> From cryptic "undefined behaviors" to . . . [the] lack of a single reference implementation of the compiler/libC, and you have a language that is harsh to defend.

I think you're confused, because this is internally incoherent.

In single reference implementation languages, all behavior is undefined behavior. Undefined behavior is just behavior for which there are no requirements imposed by the international standard. It's an unbounded form of implementation-defined behavior.

Undefined behavior does not mean that the behavior is completely unpredictable. It does mean you should read your compiler's documentation (including tweaking what happens with certain common UB). For example, if you want signed integer overflow to always wrap, and you read the GCC or Clang documentation, you'll know to use -fwrapv. If overflow could cause catastrophic failure and the program should abort if it happens (e.g., Therac-25), you'll know to use -ftrapv. There's nothing wrong with writing to an arbitrary memory address, either, if you've read your documentation and that's how your environment communicates with a particular I/O port.

> People who still write C, honest question: Why?

Because loops are fast.

I do scientific computing, where many people use python nowadays, and a few years ago it was matlab/octave. These languages feel "cramped" because they artificially force you to program in a certain way in order to avoid loops. While such a "vectorial" notation is often useful, many algorithms are better expressed using a loop notation, and C does not impose an artificial distinction between the two notations: both are as fast as they can be. The fact that python is not an appropriate language for low-level numerical computation is evident when you notice that most numeric algorithms in python are just interfaces to code written in other languages (C, C++ and Fortran).

Of course, C is not the right tool for the job either... Modern Fortran is, objectively, the ideal language for low-level numerical computing: it has native multidimensional arrays and a lot of other goodies, which C lacks.

Julia would also be a nice alternative, and I check it regularly. But I find the current interpreter too quirky. I would love to see different interpreters/compilers for this lovely language!

C has no in-built way to deal with SIMD, which is essential for high-performance computing over loads of data. On that count alone it is already out of the game.
What are you talking about? "in-built"? Have you ever written SIMD assembly before? It's comically easy to integrate SIMD optimizations into a C program.
Through in-built assembly, or some compiler-specific annotation. None of them is vanilla C, which was my point.
Actual "standard C" (along with most of the C stdlib) is pretty much useless for writing real-world applications, any non-trivial C code base will almost certainly use at least a handful non-standard extensions (sometimes even without knowing it) and both compiler- and platform-specific conditional code paths (just try how many libraries would compile with gcc's "-pedantic" flag, I bet it's not all that many).

This pragmatism by compiler vendors to just ignore the C standard where it doesn't make much sense, and to extend the language where it helps to solve real-world problems is actually a pretty powerful argument for C.

If you want truly high-performance, architecture-generic SIMD won't get you particularly far though - the utter mess of things that x86-64 does and doesn't support is an utter mess, and doing things well across fixed-width and variable-width SIMD architectures will require compromises on one of those quite often. (not at all to say that it's impossible, it's just quite full of asterisks that I personally think is too much to bother standardizing)
Part of what makes C touted as a 'low level language' is the relative ease of inlining assembly.
Which isn't part of the standard, and no compiler is required to support to achieve certification.
gcc had emitted simd instructions since the egcs days.
So does JS, Java, whatnot. That’s not the point.
I needed to improve perfomance of some numerical computations in an existing Python script. The only choices felt like C and Fortran.

I tried Rust at first but went back to C when I realized I was spending more time appeasing Rust than solving the actual problem, which wasn't really complicated enough to gain significant benefit from Rust's features.

I am working on a translation of a game engine from Go to C with another coder. One of our end goals is to make it easily available via WASM in a web browser.

As to why work in C - it’s incredibly fast, it feels very powerful as long as we manage memory correctly. We use fsanitize, which is an amazing library that can find memory leaks, buffer overruns, etc etc and run it on all unit tests. I think fsanitize is essential to have in your tool belt if you’re doing any C programming at all.

A pretty direct translation from Go to C resulted in about a 125% speed up (ie the C code was 25% faster) and this was already very optimized Go code with no allocations. From Go to WASM the results were disappointing to say the least - WASM was about 32% the speed of Go and not at all easy to multithread (and a gigantic file). From C to WASM I got a much better 79% of native speed - would have wanted a little bit more, but this is much more doable, and we haven’t begun to optimize some parts of this engine yet. And Emscripten seems to have very good pthread support, which I will try soon.

> So, setting aside the need to maintain 30+ year old code, what would be modern reasons to start a new project in C?

C code written today will still be runnable 30+ years from now, and likely on whatever platform you're using, unlike code written in some flavor of the month language. C is standardized, has been ported to every architecture, and is easy to port in general, and there's so much code that's already been written in it that the inertia behind it is virtually insurmountable. I've invested significant time in other language ecosystems (like Perl, coincidentally also on the front page) only to see them eventually declared "uncool" (however productive) and killed-off by faddish HN types. But I'm confident they won't have similar success against C.

C is the real Hundred Year Language: http://www.paulgraham.com/hundred.html

Extremely minimal runtime, portability, and very low overhead when compared to other languages. I have a tiny statistics daemon that scrapes /proc and sends out multicast packets, and it builds and runs on everything from ARMv5 to Xeons, barely showing up on any kind of resource meter and with an absurdly small binary size.

I considered rewriting it in Go a couple of times but just didn’t see the point.

I like making things (air quality monitors, web nfc login, automated garden, power monitor and etc) with microcontrollers like the Raspberry Pi Pico, the only real choices are C/C++ or some flavor of Python. I really do not like Python, it rubs me the wrong way for some reason and also I can find libraries for all the components/sensors in C/C++.

It's not so bad. Manipulating strings is a pain in the ass so everything becomes a char and managing types is so annoying, especially dealing functions that could easily take an int or float, you either have to make a template or different versions of the function for each type. This makes me appreciate dynamically typed languages a lot. Those two issues are the only problems I seem to have, everything else has been easy and breezy

Besides those two things it's pretty nice. My code is a bit verbose because I'm not that great at it but I'm sure I could reduce the lines of code in my projects (the biggest one has 4000+ lines of code, but it does a lot) by using structs and more loops, but that's mostly a skill/experience issue.

You had many answers.

You don't really start a project in C unless you target limited hardware or some low-level library that can be embedded in other things and interact with other language that can make us of C-style APIs.

C became the "new assembly", meaning it sort of replaces the role assembly had. The chips that are sold are not programmed in assembly, because they're sold with a C compiler target directly.

C is more than a programming language, it's an universal glue, so it often makes sense to use C because it gives access to everything. It's like english: you can't expect to use esperanto just because it's a superior language. Programming languages are the same.

Disclaimer: I mainly use python and C++.

Honestly because I don't want to learn another language.

And because most of the world uses C for low level stuff. You can say that Esperanto is a much better international language than English but what good does it do if nobody speaks it?

Veering off topic, there is a great rant about why Esperanto is a horrible international language: https://web.archive.org/web/20110515155117/http://www.xibalb...
Justin Rye's site is now at http://jbr.me.uk/ (and the espe-ranto at http://jbr.me.uk/ranto/ )
Quite simply there haven't been any candidates so far which both got the "essence" of C and had the momentum to actually replace C. Zig looks like the most promising so far, if they don't fuck up on their way to 1.0

(disclaimer: I switched back from C++ to C as my language of choice for writing libraries ca 2017, but also continue to write C++ (if necessary to talk to C++ libs) and a lot of Python and Typescript for simple cmdline tools and web stuff, also ObjC on Mac of course for talking to system frameworks, in recent years dabbled with Rust, Odin and Nim, in the long distant past also with C#, Java, Lisp and some Forth, and eventually hope to transition over to Zig for the stuff I currently use C for (maybe in 3..5 years?)

TL;DR: use the language that suits a problem best, and C is a very good tool to have in any language toolbox, because it can usually provide a solution where other languages have to give up or just become to much of a hassle (for various reasons)

> The C representation of memory (and all the pointer arithmetic) is not a real representation of your hardware, and this too is an abstraction.

By and large memory is a contiguous array and the C representation closely matches what is actually happening, so I am curious about which platforms you have worked on.

Tagged memory architectures don't match the C model of linear memory. They're essentially obsolete now but C is still designed to accommodate them.

A lot of the UB the people grouse about can generally be ignored because 99% of the platforms out there have the same behavior in areas where the standard is extra permissive for obsolete exotic hardware. Tagged memmory is dead, 1's complement is dead, big-endian is mostly dead. All the UB associated with them is not that relevant most of the time. The downside is that people write code that takes a lot of liberties assuming behavior that the standard doesn't guarantee. A common one is unaligned access because x86 has always been permissive about it and it took until C11 to have power tools needed to manage it in the language.

The UB problems have no relation to the machine behavior. UB exists only on the compiler.

The fact that many C developers keep confusing it with implementation dependent behavior gives me no confidence on their other opinions about the language.

> UB exists only on the compiler.

Sure, but UB exists so that compilers can generally do whatever's fastest on that particular architecture.

Your signed add instruction traps? Great, do that. Does one on a different architecture overflow? Fine. Just emit it and it's conformant.

Does the C machine model in fact consist of a single linearly addressable memory space? I think the spec mostly talks about "objects" that are linearly addressable -- not about the whole "memory" (there might not even be such a thing). Technically you aren't even allowed to compare two pointers other than for equality (relational comparisons are possible only within the same array). Just making up pointers is probably already a stretch of the spec, although you'll see lots of that in e.g. embedded projects.

(Disclaimer: I really don't know all the details of the C standards, am not a language lawyer but know enough about the language to feel quite productive in it. Please fill me in or correct me where I'm wrong).

People will abuse [u]intptr_t to compare addresses from different objects. There is no guarantee that the integer value stored in such variables is representative of a linear memory space and you're supposed to treat them as opaque data but most platforms permit such comparisons. All you're permitted to do is cast a pointer into those types and cast it back to the original type.
A couple points:

- CPU memory subsystems are very complex these days and represent a lot of shared mutable micro-architectural state, which makes it hard to reason about. That's not linear and the C language does not offer concepts which represent that complexity. Short of some prefetching intrinsics.

- Pretty much all memory will be virtually addressed, pushing you even further from the concept of flat linear memory.

- Pointer provenance [0] binds memory to types and allocations which doesn't map onto the concept of linear memory and a pointer is just an offset.

[0] https://faultlore.com/blah/fix-rust-pointers/

AFAIK C's machine model is not that linear (see my other comment). On the other hand, what most CPUs offer as an abstraction (through their instruction set) is very much so.

There are couple of arguments like that floating around and it just doesn't make a whole lot of sense. The C model is in fact a usable abstraction (and easy enough to peel off when required), otherwise it wouldn't have stuck around for so long. No amount of "network effects" and "free beer" arguments can discuss this away.

There is an argument that instruction sets might have developped a linear address space abstraction because of C, but I doubt it. Binding the IR closer to a specific physical layout would be very bad for portability and longevity of the code.

> the C representation closely matches what is actually happening

It really doesn't, though. Although your CPU might present system RAM as one contiguous array of bytes to your program, the C compiler follows different rules – see strict aliasing and other pointer dereference rules. For example, the following is Undefined Behavior and your C compiler may or may not generate the assembly you expect:

    int x = *(int *)0x1234568;
Your CPU would happily execute the equivalent machine instructions and load from address 0x12345678, while a C compiler is free to replace your entire program with return 0;
Casting an integer to a pointer is implementation defined, not UB.

And every sane implementation does what everyone expects because its how memory mapped IO works (but you probably want a volatile in there and maybe a compiler or memory barrier as well depending on what the hardware guarantees about the access patterns for that particular range of addresses)

> Casting an integer to a pointer is implementation defined, not UB.

You're right, that was a bad example. Here's a better one:

    int x, y;
    ptrdiff_t diff = &x - &y;
This is Undefined Behavior, because &x and &y don't point to the same object.
The original author was talking about hardware not behaving like linear memory, and other than caches and maybe some thread local tricks, I'm not sure what he meant. However, it seems pretty clear that CPUs do try really hard to make:

    mov rax, qword ptr [0x12345678]
do what you think it would/should.

And as for the C memory model, aliasing, and optimizations, I'm firmly in the camp that thinks the standards originally gave the compiler writers an inch to work on weird platforms and they've taken a mile when they work on reasonable ones. The intent of your integer to pointer cast is very clear, but it's been undefined to insanity. So now there is some variant of the following, which doesn't have UB but does the exact same thing less clearly:

    uintptr_t i = 0x12345678;
    int* p = 0;
    memcpy(&p, &i, sizeof(int*));
    int x = *p;
I'm sure some language lawyer will correct me on some obscure detail of the standard, but it could be fixed with some modification. The point to me is that using memcpy instead of pointer casts is NOT an improvement. The good compilers will generate the same code as the assembly above, so all they've done is made the C source less readable.
> The point to me is that using memcpy instead of pointer casts is NOT an improvement.

The improvement comes when there are multiple accesses that could potentially point to the same memory. Consider a silly function:

    void f(int16_t* a, int32_t* b) {
      for (int32_t i = 0; i < 100; i++) {
        b[i] = a[0] + i;
      }
    }
If type-based alias analysis is enabled, then the compiler can assume that a[0] does not alias b[i] because they are different pointer types. So it can hoist the load of a[0] outside the loop, improving efficiency. If strict aliasing is disabled, it cannot assume this, so it must reload a[0] each time: https://godbolt.org/z/E7jxfYsbx

The memcpy() makes it clear that the memory could alias anything, so it will generate the less efficient code even if strict aliasing is enabled: https://godbolt.org/z/KoPxK9fPj

Memory aliasing is a huge thorn in the side of the optimizer, because the compiler frequently has to allow for the possibility that different pointers will alias each other, even if they never will in practice. The code might end up being slower than necessary for no real reason. Strict aliasing is one of the few tools we have to tell the compiler that aliasing will not occur.

I don't think that C actually forbids this code:

     *(int*)0x12345678
The rule is just: if you access it as an int, you have to consistently access as an int. You can't mix types from one access to the next, eg:

    *(long*)0x12345678
    *(int*)0x12345678
> Strict aliasing is one of the few tools we have to tell the compiler that aliasing will not occur.

I can see the argument, but there's a much better way to indicate what you want with your example:

    void f(int16_t* a, int32_t* b) {
      const int16_t a0 = a[0];
      for (int32_t i = 0; i < 100; i++) {
        b[i] = a0 + i;
      }
    }
Now a clean (well defined) compiler could do what you asked.

I've seen other people suggest that UB is a mechanism to have these magical backdoor conversations with the compiler to express optimization opportunities. I think that's absurd and reckless. Propose adding assertions or "declare" statements instead, and quit thinking of interpretive dance through a minefield as a method of communication.

You are entitled to your opinion. C isn't perfect, but as someone who spends my life trying to optimize the efficiency and code size of critical loops to the max, I like the direction C has gone with UB and optimizations. It's not the right tool for every problem, but for the most size/speed critical code it's hard to beat IMO.
> I don't think that C actually forbids this code:

     *(int*)0x12345678
If not, give it time. It was only a few years ago when you were allowed to use a union for that kind of thing. I really believe they'll eventually make everything except unsigned integers be UB.

"Oh, the code was never correct. You just got lucky before."

If you want to load from address 0x1234568, assign it to a char pointer first. Then the cast is legal and defined.

Your point that C is stricter than asm of course still stands.

> and load from address 0x12345678

and most likely seg fault, or similar

1. If the CPU lacks an MMU and the address falls into an accessible address space, it won't segfault.

2. If the CPU has an MMU, it won't segfault if the address is mapped to an accessible region of memory.

3. This is besides the point, because the CPU will execute the instruction and attempt to load from that address. A C compiler might emit the load instruction, or it might assume that this code branch will never be executed and can therefore be replaced with code that sends an angry email to your mother.

Only if you ignore memory layout and understand the UB on your platform. The language does not make as many guarantees as "C is simple" folks seem to think. Throw a sanitizer at any of their code, and you'll see unaligned memory accesses all over the place.
Most modern hardware make use of registers and multiple layers of caches and you need ton of UB to justify the compiler making use of them.
Registers are orthogonal with memory layout. Cache does not change the general model.
> Add to this CPP macros, a universally recognized bad idea

I don't think is not a bad idea. You can't solve language incompatibilities in the language it self. Textual macro languages solves this nicely.

CPP is what makes C and C++ work for projects aimed at multiple platforms or compiler vendors.

Yes, you can. Two approaches:

1. Multiple implementations providing a unified interface, selected by the build system. Aka the Henry Spencer approach: <https://www.usenix.org/legacy/publications/library/proceedin...>

2. Less-bad macros, e.g. cond-expand: <https://weinholt.se/articles/cond-expand-and-ifdef/>

Option 1 only works if there is a sensible unified interface, and if you feel like spending the time making that for what could be just one line per target. And it just won't work for things that don't really "have an interface", i.e. conditionally adding an __attribute__((optnone)) to a function that a specific compiler version gets stuck in an infinite loop optimizing, or macros that expand to some _Pragma-s that apply to a loop following it for controlling unrolling/vectorization if available, or managing custom inlining configurations for functions based on the optimization/debug levels, or defining a type as either 32-bit or 64-bit depending on requirements, or redefining all printf & fprintf usages to something mingw-friendly.

Many of those could be solved by some other means, but C macros neatly encompass all of those.

Because most of the projects I want to work on are in C. Postgres, the Linux kernel, lots of legacy systems stuff. All the foundations of our field are in C, so that's what I use when I want to contribute or study them.
Longevity. As sure as eggs is eggs, reasonable C that I write today will be compilable in 30 years time. Python? Breakage every couple of minor versions.
It's more that it's the most honest representation of the assembly/machine code. We can't really get closer to the hardware than the interface the CPU offers, and C then sticks pretty close to that (or a subset of it, I suppose).

It's the simplicity and power of C that I find attractive. I don't write it professionally at the moment, but I enjoy it. It's obviously not the right tool for the job most of the time for the reasons you give, but I miss its elegance.

I am a big fan of rust, but it's massive compared to C. I'd like to explore Zig some day.

> People who still write C, honest question: Why?

It was my first programming language and I still think it's a simple and fun language. Also many things have a native C interface so it's a natural choice in those cases. It's certainly not the only language I use, but for many things my default. What's nice is that I don't have to consciously think much about the language when I use it because I know it well.

It's kind of like English. English is in some sense a simple language (grammar), a poor mixture of other languages and its orthography is not good (not an elegant language), but everyone speaks it.
It’s a very simple and explicit language that is easy to write high performance code with and can be used as high level, portable assembly which integrates easily with actual assembly due to a simple and stable ABI. It compiles extremely quickly, its tooling is mature and robust, and you can write it for any platform and do basically everything with it because it is a lingua franca where almost everything has a native API that uses the C ABI.

C’s type system is lacking, I wish it was more strict, and sometimes I wish it had some features from C++ (operator overloading for mathematical types, templates for generic programming) and features of other languages (multiple return values especially), but overall I’m okay with its limitations and have become used to working around them. Sometimes I compile C code with a C++ compiler just to take advantage of stricter typing, templates, etc. but for a lot of projects this isn’t a necessity.

> I don't think I've ever seen a computer where memory is actually a continuous array of bits sorted by memory address

well, that is not the C memory model. C does not allow you to access bits in memory directly. maybe you meant bytes? or words? if so, many cpus have exactly that architecture.

> C does not allow you to access bits in memory directly.

of course it does what are you talking about?

Not the commenter you're replying to, but I suspect what they mean is that the C memory model is byte-addressable not bit-addressable. You can't point/refer to a specific bit in memory, instead you have to first read the byte and then select an individual bit using bitwise operations, much like most modern processors.
That has nothing to do with the C memory model, but how the CPU is structured. No modern CPU has an interface for bit-address accessing as far as I am aware...

C makes no assumptions about the size of a byte

C doesn't really know about bytes. It has chars, but I believe there are some constraints on char, specifically, they have to be big enough to hold the ASCII charset. (I'm pulling real deep here, someone correct me if I'm wrong)
C11 3.6p1 byte "addressable unit of data storage large enough to hold any member of the basic character set of the execution environment"
If I remember correctly, it assumes the size of a char is greater than or equal to seven bits, and a char is defined to be the smallest addressable unit.

C does not support bit-addressing.

The width is defined as CHAR_BIT >= 8 (C11 5.2.4.2.1p1). The size, sizeof (char), is always 1.
I wouldn't consider accessibility (via masking & shifting or struct bit fields) to be the on the same order as the byte-level addressing you get with pointers.
bits are not addressable in C and are thus not directly accessible.
They are also not normally directly addressable by the CPU, you'll have to do some combining and splitting with separate instructions. Some CPUs are better at this than others.
Tangent: some Arm Cortex-M class CPUs had a feature called "bit-banding" where you could do byte accesses to an area of the address map and the CPU would turn these into bit accesses to a different part of memory. So the alias word at 0x23FFFFFC maps to bit [7] of the byte at of RAM at 0x200FFFFF, for example, and you can do a word write to 0x23FFFFFC to change just that bit 7, saving having to do it by hand (which is particularly awkward if you need to ensure the atomicity of the bit update).

https://developer.arm.com/documentation/100165/0201/Programm...

I wouldn’t quite count it as bit-addressing, but x86, for example, can load bits directly into the carry flag using the BT instruction which can take a register or memory address as it’s first argument, with the bit being given as the second.
There have been all kinds of variations on that theme. One of the nicest is 'bit test and set' as an atomic instruction, that one enables a whole raft of nice stuff.
Existing C examples from semiconductor vendors do not allow other languages. Ok, C++ is also used, but that’s it. So it’s no brainer taking available drivers and building logic around them. That’s current state in embedded development. Client does not pay for use of modern languages.
I target microcontroller platforms, some of which only have a single compiler, usually some patched 20 year old version of GCC. The only possible alternative to C would be something that transpiles to ANSI C, given that some of these platforms don't quite have full C99 support.
It‘s the only language supported by basically all platforms, microcontrollers, GPUs, web-browsers. Although that is also almost true for C++ nowadays. I‘m also curious which memory model would be superior in your opinion?
I use C for microcontrollers. I think Rust is making some inroads, but the libraries/tooling is not there yet.
neither does the IDE tools I feel, it's going to take a while, and Rust has been here for 17 years.
IDE tools like what? LSPs for Rust are on par / better than that of C/C++, partially because of language being stricter, no #include nonsense etc. Unlike C/C++, sane build system and dependency management system that are universally agreed upon actually exist. What exactly is "going take a while"?
I would assume debugger support. Rust is in a tough spot because a lot of code gets compiled away, and debuggers need to understand some Rust-isms for good experience, like enum support. I don't think this is an insurmountable situation, though.
People universally agree that replicating NPM's dependency hell was a good idea?
>it's going to take a while, and Rust has been here for 17 years.

Technically correct, but Rust was changing significantly from version to version prior to the 1.0 release some 8 years ago, notably the green thread runtime was removed.

Because I don't like "magic". I can understand the appeal of one-liners that do the work of 100 (or more) lines of C but that's just not what I like to do. I like to be in control. I don't like side effects. "Undefined behaviors" is a propaganda. In my 15 years of programming in C, I never had an issue with "undefined behaviors". Things I created a decade ago still run like a champ on a damn coin cell.
History.

You do what your operating system vendor does.

Not few operating systems have a C interface. The implementation of binaries (see also application binary interfaces) depends on the operating system.

Shared libraries (e.g., DLL) are binaries, too.

C compiler developers have the ability to generate consistent[1] binary outputs.

In simpler terms, vendors of these compilers can reach a consensus on how to convert C code into binary files, known as Application Binary Interfaces (ABI).

It is not uncommon[2] to have a foreign function interface in C.

1. http://yosefk.com/c++fqa/defective.html

2. https://learn.microsoft.com/en-us/cpp/dotnet/calling-native-...

There are a few reasons:

    The Lindy effect.
    You can run C code from anywhere.
    There are places where it is much easier and better to run C code than anything else.
All of these are related to how long C has been around. I think that's also the reason why we use JavaScript extensively.
C just feels good to read and write. Every other language suffers from not being C.
If you're doing anything in embedded systems/hardware, expect to be using C. Yes, Embedded Rust and MicroPython are a thing now, but if I need to work with any partner or customer I'll be in a world of pain, because 99% of that industry uses C. My customers start new projects in C every other week. If you need to be Processor independent, Portable, Performant, have access to Bit manipulation, and need direct control the Memory management, along with a massive ecosystem, C is almost the only option.
For me I use C because it's the de facto system programming language for botb Linux and Windows. Another reason is that C grammar is simple (but has a lot of quirks I do admit.)
Professionally I do Python. From my experience the breakages occur due to an over-reliance on libraries to do trivial tasks. Do you find a different case?
Unfortunately I'm not professional enough to answer this question. I use C to learn system programming only and I never had the capacity to look at the kernel.
C can be coded much safer as long as I don't code in 'odd' ways, e.g. trying to be really smart with it. By following common-sense coding rules it seems pretty safe to me.

Like it or not, C might still be the most widely used language after 50 years, it will not go away, instead, future AI code review tools, static analyzers, more powerful compilers will evolve fast to make C safe and alive. Why, the price to replace it will be much higher in practice, it might simply be impossible.

Because there's no easier way to access various libraries.

Yes, that library does things with pointers the new language can't prove are safe. It's been used for longer than you've been alive and it isn't changing. If a new language can't express what it's doing, well, the library isn't going to move, the language is. Therefore, I either have odd shims and contortions or I have C.

I await a Buzz Language to eventually have "inline C" the way C has inline assembly.

I don't write C as much as i used to but i still write a lot of it, including new code. The reasons are:

1. C is relatively simple. Sure, not as simple as it could be (e.g. compared to something like Oberon-07) but in the grand scheme of language things, it is far on the simpler side of the spectrum. I can write a C parser relatively easy if i want to for example (and at some point years ago i did that to transpile a C project to C# to run under Sony's PSM platform that was based on Mono and allowed only C#).

2. Undefined behavior is annoying as it can break previously working code with newer versions of the same compiler (though language lawyers playing word games like the code already being broken are way more annoying - the code did the thing i wanted previously so as far as i am concerned it was not broken), but this is something that aside from "obvious" things (accessing invalid memory) i can probably count in my fingers the times i encountered in practice (i write "probably" because right now i can't remember any case, but i've being writing C for more than 20 years). Valgrind and Ubsan help with these so they are not much of a practical concern.

3. I find CPP macros to actually be very useful and a feature that a) i'd actually like expanded instead of being stuck in the 80s (let me store some state or have a loop, FFS) and b) were available on languages too (Free Pascal is a language i also use and does have some C-like macro support, which is more than what you'd find in other languages but still not to the same extent as C). D's mixins essentially being ubermacros are a thing that i liked with that language but sadly their stance on breaking things is something that kept me away from it.

4. A C compiler is available on pretty much everything that can compute things - or at least on pretty much everything i might think on targeting with C anyway (and chances are there are multiple C compilers instead of just one). If not, i can probably write a compiler myself - it'd be rather simple and not that great but i'd be more likely to finish it than a compiler for some other language.

4b. Very related, so it gets a "4b" instead of 5 :-P, but there are a bunch of IDEs and editors that "understand" C. I like IDEs, i like syntax completion, i like semantic highlighting, i like being able to easily rename an identifier, etc and C being easy to parse (see #2) means it has a lot of those. Let me correct that, i don't "like" IDEs, i love IDEs.

5. Most modern computers might not technically be like how C presents them to be, but they're close enough where any differences only matter if you're trying to perform microoptimizations to your microoptimizations - at which point you'd most likely be using a combination of compiler-specific heuristics and assembly code anyway.

6. In most systems where that'd be a concern, the C ABI is pretty much stable or at least there is a stable C ABI, allowing any code written in C to be usable by other languages as well as shared libraries to be able to expose an ABI that will remain backwards compatible and usable by other languages. Of course other languages can do that but they pretty much always do it through a C-fication of their APIs.

7. C compilers - even those that perform a dangerous (see #2) number of optimizations - tend to be very fast. I hate waiting the computer to finish doing things so i tend to prefer languages with fast compilers.

8. While i don't (always) need to maintain 30+ year old code, i do have existing C code that (seemingly, see #2) works and i don't see a reason to waste time rewriting that code in some other language. Even if it'd be broken chances are it'll be faster to fix it than rewrite it.

9. I am comfortable with C. For me being comfortable with a language important because it lets me focus on the thing i'm trying to use the language for instead of the language itself.

There might be other stuff i forgot, but the above should give you an idea why i personally write C. Though note that i don't see as any sort of perfect language, there are a lot of things i'd like it to do better - including the type system you mentioned as well as the compile-time code evaluation i wrote above, be it via CPP or by some other means - but it is good enough.

About 4., there is already a preprocessor loop proposal for clang[0] and I think I have also seen it some obscure Qualcomm bluetooth chip compiler.

[0]: https://discourse.llvm.org/t/rfc-new-preprocessor-macro-dire...

> let me store some state or have a loop

This is arguably already possible, and was possible even before c99 added variadic macros. Although the code is a bit cumbersome to write.

It it technically possible in that C macros are supposedly Turing complete, but i mean i want something like being able to add a value to a variable, iterate through values (proper list would be neat but i'd be ok with a string of space separate values), etc.
> It it technically possible in that C macros are supposedly Turing complete

It isn't Turing complete, because it will always terminate, but you can make the execution time (number of execution steps) arbitrary large exponential in respect to the number of source lines.

There are a few libraries that implement that.

https://github.com/rofl0r/chaos-pp: Quite high level implementation, that supports arbitrary precision decimal base arithmetic.

https://github.com/camel-cdr/boline: Mine implements 8/16/32/64 bit arithmetic, and low level control flow.

You can very often get away with using unary numbers and/or constant expressions to work around the limitations without needing a library.

Got any problem in mind? I've got some time on my hands to problem solve.

> It isn't Turing complete, because it will always terminate

Well, i wrote "supposedly" because i didn't try it myself but found a post[0] that claims it is. The example is even about making loops.

But the point is that these aren't only way too hacky but also slow down compilation. I did use some of my own preprocessor hacks when i wanted to do some fancy stuff with it at the past to implement an RTTI system that allowed automatic serialization of structs with nesting and references and while it worked (x-macros FTW), it was cumbersome and slowed down compilation so much that at the end i found it both much simpler and faster (in compilation time) to replace a ton of preprocessor macros with a code generator and a couple of #includes that included the generated code.

[0] https://stackoverflow.com/questions/3136686/is-the-c99-prepr...

Many people still write C because tons of crucial software, probably things you use every day, are written in it and that software needs to be maintained and improved.
IT's an old war-time friend that when battle is up we both know how to shoot and be effective at it - both towards enemies as well as our feet.
Small binary and a toolchain that's small and older than most programmers using it, and known to be bug-free. Top tool if you want to write something that a real human being can "get under the hood of" and understand throughout.

As for low-level, sure that's no longer the case. It was a low-level language for K&R and their PDP-11 where they could tell precisely what will be assembler code for each line of their C code and how many CPU cycles it will take. That's no longer the case indeed.

> toolchain [...] known to be bug-free

You cannot be serious. "Well known list of bugs" would be more in line with the state of affairs.

Because FreeRTOS is written in C.
The last time I wrote C was for my OS class
Sometimes Safety suffers performance