Hacker News new | ask | show | jobs
by uecker 253 days ago
I do not agree that the integer promotion or casting (?) rules are broken in C. That some people make mistakes because they do not know them is a different problem.

The reason you should make length signed is that you can use the sanitizer to find or mitigate overflow as you correctly observe, while unsigned wraparound leads to bugs which are basically impossible to find. But this has nothing to do with integer promotion and wraparound bugs can also create bugs in - say - Rust.

5 comments

I meant implicit casting, but I guess that really falls under promotion in most cases where it's relevant here (I'm on a train from Aarhus to Copenhagen right now to catch a flight, and I've slept considerably less than usual, so apologies if I'm making some slight mistakes).

The issues really arise when you mix signed/unsigned arithmetic and end up promoting everything to signed unexpectedly. That's usually "okay", as long as you're not doing arithmetic on anything smaller than an int.

As an aside, if you like C enough to have opinions on promotion rules then you might enjoy the programming language Zig. It's around the same level as C, but with much nicer ergonomics, and overflow traps by default in Debug/ReleaseSafe optimization modes. If you want explicit two's complement overflow it has +%, *% and -% variants of the usual arithmetic operations, as well as saturating +|, *|, -| variants that clamp to [minInt(T), maxInt(T)].

EDIT to the aside: it's also true if you hate C enough to have opinions on promotion rules.

I prefer C to Zig. IMHO all the successor languages throw out the baby with the bathwater and add unnecessary complexity. But Zig is much better than Rust, but, still, I would never use it for a serious project.

The "promoting unexpectedly" is something I do not think happens if you know C well. At least, I can't remember ever having a bug because of this. In most cases the promotion prevents you from having a bug, because you do not get unexpected overflow or wraparound because your type is too small.

Mixing signed and unsigned is problematic, but I see issues mostly in code from people who think they need to use unsigned when they shouldn't because they heard signed integers are dangerous. Recently I saw somebody "upgrading" a C code basis to C++ and also changing all loop variables to size_t. This caused a bug which he blamed on working on the "legacy C code" he is working on, although the original code was just fine. In general, there are compiler warnings that should catch issues with sign for conversions.

> Recently I saw somebody "upgrading" a C code basis to C++ and also changing all loop variables to size_t. This caused a bug which he blamed on working on the "legacy C code" he is working on, although the original code was just fine.

I had the same experience about 10 years back when a colleague "upgrade" code from using size_t to `int`; on that platform (ATMEGA or XMEGA, not too sure now) `int` was too small, overflowed and bad stuff happened in the field.

The only takeaway is "don't needlessly change the size and sign of existing integer variables".

I don't think this is the only takeway. My point is that you can reliably identify signed integer overflow using sanitizers and you can also reliably mitigate related attacks by trapping for signed integer overflow (it still may be a DoS, but you can stop more serious harm). Both does not work with unsigned types except in a tightly controlled project where you treat unsigned wraparound as a bug, but this fails the moment you introduce other idiomatic C code that does not follow this.
Yes, this is one of the more subtle pitfalls of C. What helps is that in most contexts the value of 2 billion is large enough that a wraparound would be noticed almost immediately. But when it isn't then it can lead to very subtle errors that can propagate for a long time before anything goes off the rails that is noticed.
It's interesting to hear these takes. I've never had problems catching unsigned wrap bugs with plain old memory sanitizers, though I must admit to not having a lot of experience with ubsan in particular. Maybe I should use it more.
GCC's sanitizer does not catch unsigned wraparound. But the bigger problem is that a lot of code is written where it assumes that unsigned wraps around and this is ok. So you you would use a sanitizer you get a lot of false positives. For signed overflow, one can always consider this a bug in portable C.

Of course, if you consistently treat unsigned wraparound as a bug in your code, you can also use a sanitizer to screen for it. But in general I find it more practical to use signed integers for everything except for modular arithmetic where I use unsigned (and where wraparound is then expected and not a bug)

I've had some fun reviewing some very old code I wrote (1980's) to see what it looked like to me after such a long time of gaining experience. It's not unlike what the OP did here, it reads cleanly but I can see many issues that escaped my attention at the time. I always compared C with a very fast car: you can take some corners on two wheels but if you make a habit of that you're going to end up in a wall somewhere. That opinion has not changed.
I think the correct comparison is a sharp knife. It is extremely useful and while there is a risk it is fully acceptable. The idea that we should all use plastic knifes because there are often accidents with knifes is wrong and so is the idea that we use should abandon C because of memory safety. I follow computer security issues for several decades, and while I think we should have memory safety IMHO the push and arguments are completely overblown - and they are especially not worth the complexity and issues of Rust. I never was personally impacted by a security exploit caused by memory safety or know anybody in my personal vicinity who was. I know many cases where people where affected by other kinds of security issues. So I think those are what we should focus on first. And having timely security updates is a hell lot more important than memory safety, so I am not happy that Rust now makes this harder.
That's an interesting point you are making there. The most common exploits are of the human variety. Even so it is probably a good idea to minimize the chances of all kinds of exploits. One other problem - pet peeve of mine - is that instead of giving people just security updates manufacturers will happily include a whole bunch of new and 'exciting' stuff in their updates that in turn will (1) introduce new security issues and (2) will inevitably try to extract more money from the updaters. This is extremely counterproductive.
The real problem with C is that it's not just a sharp knife, it's a knife with poor ergonomics that makes it more prone to cutting yourself.

The answer to that though is probably more something like Zig than something like Rust.

Hi, are you interested in Zig? Then please check out my port of jsmn to Zig. I wanted to know if people will like it and if there are any downsides others might not. https://github.com/Ferki-git-creator/jsmn_zig
Except any good chef or butcher knows that they should be wearing protective gloves when using sharp knifes.

> Cut-resistant gloves are an essential piece of safety equipment in any kitchen.

https://www.restaurantware.com/blogs/smallwares/how-to-choos...

Where are C's gloves?

I’m sorry, but there is an incredible amount of hard data on this, including the number of CVEs directly attributable to memory safety bugs. This is publicly available information, and we as an industry should take it seriously.

I don’t mean to be disrespectful, but this cavalier attitude towards it reads like vaccine skepticism to me. It is not serious.

Programming can be inconsequential, but it can also be national security. I know which engineers I would trust with the latter, and they aren’t the kind who believe that discipline is “enough”.

CVE are important but there’s also a lot of theatre there. How many are known exploitable? Most aren’t if you follow threat intel. Most of the Internet infrastructure is running c/c++ and is very safe.
It's fine to have a sober view of the severity, but we can hopefully agree in general that writing any program in C or C++ that faces the internet requires extreme caution.
Number of CVEs is completely irrelevant. Also Google's or Microsoft's priorities are completely irrelevant. If you have reliable data from the real world, please show it to me.
So what do you propose to do?
I propose that we start taking the appropriate amount of professional responsibility.

That includes being honest about the actual costs of software when you don’t YOLO the details. Zero UB is table stakes now - it didn’t use to be, but we don’t live in that world anymore.

It’s totally fine to use C or whatever language for it, but you are absolutely kidding yourself if you think the cost is less than at least an order of magnitude higher than the equivalent code written in Rust, C#, or any other language that helps you avoid these bugs. Rust even lets you get there at zero performance cost, so we’re down to petty squabbles about syntax or culture - not serious.

> That some people make mistakes because they do not know them is a different problem.

We can argue til we're blue in the face that people should just not make any mistakes, but history is against us - People will always make mistakes.

That's why surgeons are supposed to follow checklists and count their sponges in and out

Could you expand on how these wraparound bugs happen in Rust? As far as I know, integer overflow panics (i.e. aborts) your code when compiled in debug mode, which I think is often used for testing.
>while unsigned wraparound leads to bugs which are basically impossible to find.

What?

unsigned sizes are way easier to check, you just need one invariant:

if(x < capacity) // good to go

Always works, regardless how x is calculated and you never have to worry about undefined behavior when computing x. And the same invariant is used for forward and backward loops - some people bring up i >= 0 as a problem with unsigned, but that's because you should use i < n for backward loops as well, The One True Invariant.