Hacker News new | ask | show | jobs
by stephc_int13 27 days ago
As a C programmer, I find this kind of bad faith article very irritating.

Yes, the standard library is bad. This is by far the worst part of the C legacy. But it is not that hard to write your own.

String functions like this are not difficult at all, and you can use better naming and semantics, write faster code etc.

C is not the C standard library, ffs.

4 comments

I don't think it's in bad faith.

The distinction between a language and its standard library gets blurry even in theory, and in practice they're nearly inseparable. If a language's standard library has four ways of doing almost the same thing, and they're all fundamentally broken, that's a problem.

If you read the other articles by the same author on his blog, you'll see that he has some strong and weird opinions about C and UB.

Complete BS in my opinion.

Exactly. A wrapper that handles all of the edge cases properly and gives proper reporting just gets added to your own library of functions and the devs get used to using it. Much like the code for abstract data types like lists/hashmaps/etc which neither C nor the standard libraries provide.

Bonus points for having bespoke linting rules to point out the use of known “bad” functions.

In one old project we went through and replaced all instances of sprintf() with snprintf() or equivalent. Once we were happy that we’d got every occurrence we could then add lint rules to flag up any new use of sprintf() so that devs didn’t introduce new possible problems into the code.

(Obviously you can still introduce plenty of problems with snprintf() but we learned to give that more scrutiny.)

> like lists/hashmaps/etc which neither C nor the standard libraries provide

There is a hashmap implementation though: https://man7.org/linux/man-pages/man3/hsearch.3.html

“One hashmap for your entire program” is not generally what people mean when they want a hashmap.
> The three functions hcreate_r(), hsearch_r(), hdestroy_r() are reentrant versions that allow a program to use more than one hash search table at the same time.
Sure there's an implementation, but like the integer comparison functions that sparked this thread there are some severe limitations with the implementation.

(In fact, looking at it again, I assume I'd purposely purged it from my memory given how terrible it is.)

The non-extensible nature is the biggest one. There are plenty of times when the maximum number of elements needed to be stored will be known in advance. (See the note about hcreate().)

Secondly the hserach() implementation requires the keys to be NUL terminated strings since "the same key" is determined using strcmp(). Good luck if you want to use a number, pointer, arbitrary structure or anything else as a key.

Any reasonable hash table implementation would not have either of these limitations.

Maybe I needed to say:

> > like lists/hashmaps/etc which neither C nor the standard libraries provide

... reasonable implementations of.

While snprintf() is better than sprintf(), I find that it's easy for people to not check if the return value is bigger than the provided size. Sure, it prevents a buffer overflow, but there could still be a string truncation problem.

Similar to how strlcpy() is not a slam dunk fix to the strcpy() problem.

That's partly the point.

If someone uses sprintf() you have to go faffing around to check whether they've thought about the destination buffer size. The size of the structure may be buried far away through several layers of other APIs/etc.

Using snprintf() doesn't solve this in any way, but checking whether the new use of snprintf() checks the return value is relatively simple. Again, there's still no guarantee that there aren't other problems with snprintf() but, in our experience, we found that once people were forced to use it over sprintf() and had things checked in PR reviews we found that the number of instances of misuse dropped dramatically.

It wasn't the switch of functions that reduced the number of problems we saw, but the outright banning of the known footgun `sprintf()` and the careful auditing and replacement of it with `snprintf()` that served as a whole load of reference copies for how to use it. We spread the work of replacing `sprintf()` around the team so that everyone got to do some of the switches and everyone got to review the changes. And we found a whole load of possible problems (most of which were very unlikely to ever lead to a crash or corruption.)

The same would apply if you picked any other known footgun and did similar refactoring/rewrites/auditing/etc.

Anyway, I haven't done C commercially/professionally for about 5 years now. I do miss it though.

The thing I find irritating is all the folks who say C is broken because it’s not a write once run anywhere language like JavaScript or python. Part of the deal has always been that the programmer needs to understand the target platform and the target compiler’s behavior.
Write once run anywhere? But C already is a "write once run anywhere" language! Though, you usually have to recompile first :)

The criticisms related to UB are not about understanding the target platform and the target compiler's behavior. Undefined Behavior is not the same thing as Implementation-defined Behavior, and lots of folks (including me) would be satisfied with reclassifying chunks of UB as the latter.

The behavior of the target platform isn't really the issue. C23 mandates two's complement for signed integers. Most hardware wraps on overflow, but that literally doesn't matter. The standard says a program exhibiting signed overflow is undefined, period.

In practice, UB rules mean the compiler is free to remove checks for signed overflow/underflow, checks for null pointers, etc. This can and does happen. Man, just a few weeks ago, I just had to deal with a crash in a C program that turned out to be due to the compiler removing a null check. That was a painful one.

> crash in a C program that turned out to be due to the compiler removing a null check.

The what now? Though not lately, I did program in C for 15 years and never seen something like this. I did see some compiler bugs on obscure platforms (SINIX, IRIX, HPUX on Itanium64, etc.) with proprietary compilers, this kind of thing would make really get me shouting.

Were you able to determine why the compiler did this? Is it a bug in the compiler?

If the compiler can find any operation prior to the null check that would be UB if the value is null (even if it is something that in assembly would be harmless, like performing pointer arithmetic on it), the compiler is allowed to assume the pointer is not null, and thus omit the null check. This could then lead to something that will in practice cause problems like dereferencing the pointer.

Compilers keep taking more and more advantage of inferring that a values in variables cannot be `x`, because if it were than some previous usage would have been UB. When people file bugs to complain, the compiler authors point at the spec which allows them to assume that UB behavior never happens, so the compiler behavior is legal. The only counterargument is if the compiler has chosen to document some specific behavior for this UB (possibly only with specific flags enabled) in which case the compiler testing that scenario as proof of impossibility is indeed a bug (when the required flags are set).

The point of this post, though, is even something as simple as "give me this string as an integer" doesn't have an answer that doesn't come with "are you OK with this best effort parse under these edge cases? Oh and we use this number as error, so you can't parse that".

Like… edge cases? It's parsing a number! We're not talking about I/O on hard vs soft intr NFS mounts, here. There's a right answer.

strlen(), on valid null terminated strings, doesn't come with caveats like "oh we can't measure strings of length 99".

But sure, C is turing complete. It is possible to solve any problem a turing machine can solve.

> understand the target platform and the target compiler’s behavior.

This is neither. This is purely the language.

Somewhat true, but C is pretty close to translating directly to machine code, even if most compilers now do so many complex things the assembly can be pretty far off. My point being is that if you have a type int in your program, it's specifically tied to the byte size of an integer on the target platform. While it can be 8, 16, 32, 64 bits, it's defined based on what the target platform supports efficiently.

So, when you say, "it's purely the language", I have to disagree. The language means different things on different platforms but it's still defined exactly on the target platform. And it's efficient on that platform.

Nowadays, we prefer correct vs. efficient, which I do agree with, of course. But, I also understand why C is like it is. It is possible to claim it's a problem of the language but I would argue that it is not. C gives us barebones and working with it we have to know this. If that's not needed then sure, other languages will be easier to work with.

> C is pretty close to translating directly to machine code

The C standard defines only its abstract machine, not actual hardware.

> The language means different things on different platforms but it's still defined exactly on the target platform

It's implemented to support a target platform, so that programs behave as if they ran on the abstract machine.

It'd be nice if we could move more stuff from UB to implementation defined.

Do keep in mind that target platform can change, in this regard. E.g. IIRC OpenBSD doesn't guarantee the ABI backward compatibility that Linux does, and can change things like size of int if they want, between versions.

> I also understand why C is like it is

Yup. It can be true that I understand why, and still understand that it's 2026.

isn't the whole point of C that it's portable assembly though? needing to understand the target platform/compiler's behavior to write correct code seems to cut against that claim quite a bit.
No. What gives that idea? The language doesn't even fix the data size of its primary numerical type. No way anyone thought that was portable.
Is this sarcasm? I thought C didn't fix the size of int because they were trying to make C programs "portable" between architectures with different natural word sizes. It was a mistake, but I remember that as being the stated reason. I'm happy to be corrected if I'm misremembering my history though.
Why would it be a mistake? It's efficient for the target platform.

The same code can be compiled for different platforms, yes, but the assembly and machine code will vary significantly, so it could behave differently. Porting to a new platform was usually a very complex process, but the code produced was efficient. Nobody seems to care about this nowadays, though, it seems.

Except the way it's done in C is illogical to the point where it negates the value proposition.

People expect numbers to support specific ranges and it is fine to define the data types numerically rather than as a concrete bit pattern, but C just takes the cake.

Char is at least 8 bits, short is at least 16 bits, int is at least as big as short (genius idea), long is at least 32 bits, long long is at least 64 bits.

The point of "int" is to be the integer equivalent of size_t and therefore be of word size.

But nobody uses int like that. Everyone assumes it's a 32 bit datatype when it isn't.

The use case where you port existing C code to a microcontroller is extremely unappealing, because the number range gets changed under your feet. When I've had to work on embedded software everyone just used int8_t, int16_t, int32_t, int64_t for portability instead.

I suppose one could say that they didn't fix the data size so the language would be portable. But I can't see how the intent was that programs would be portable, if you define portable to mean 1:1 functioning across differing platforms.
The people downvoting you are probably not C programmers and love to hate C.
I guess trying to write in Rust makes them irritable.