Hacker News new | ask | show | jobs
by simonask 25 days ago
I think the article's point is that you don't actually have to get weird at all to run into UB.

Lots of people mistakenly think that C and C++ are "really flexible" because they let you do "what you want". The truth of the matter is that almost every fancy, powerful thing you think you can do is an absolute minefield of UB.

4 comments

My go-to example of "UB is everywhere" is this one:

    int increment(int x) {
        return x + 1;
    }
Which is UB for certain values of x.
C23 removed the whole stuff about indeterminate value and trap representation. Underflow/overflow being silent or not is implementation defined.
Signed overflow is just undefined.
TBF that is the same as saying "signed overflow is UB".
yes but it is a 'picture' that makes you think about it in a different way.
I've long said that the value a programming language offers is as much about what it doesn't allow as what it does allow. Efficiency aside, most useful programs could be written in most languages, but there are an infinite number of programs you could write that aren't particularly useful. Ruling out the programs you might accidentally write that resemble the one you intended is a pretty useful feature of a language, and it's a metric that C and C++ rate quite poorly on IMO.
I would agree that C is "really flexible", but I would say it's primarily flexible because it lets you cast say from a void pointer to a typed pointer without requiring much boilerplate. It's also flexible because it lets you control memory layout and resource management patterns quite closely.

If you want to be standards correct, yes you have to know the standard well. True. And you can always slip, and learn another gotcha. Also true. But it's still extremely flexible.

The problem is that a lot of the flexibility introduced by UB doesn't serve the developer.

Take signed integer overflow, for example. Making it UB might've made sense in the 1970s when PDP-1 owners would've started a fight over having to do an expensive check on every single addition. But it's 2026 now. Everyone settled on two's complement, and with speculative execution the check is basically free anyways. Leaving it UB serves no practical purpose, other than letting the compiler developer skip having to add a check for obscure weird legacy architectures. Literally all it does is serve as a footgun allowing over-eager optimizations to blow up your program.

Although often a source of bugs, C's low-level memory management is indeed a great source of flexibility with lots of useful applications. It's all the other weird little UB things which are the problem. As the article title already states: writing C means you are constantly making use of UB without even realizing it - and that's a problem.

If we're talking two's complement it's not undefined that is right. Having to emit checks though, that is where I beg to differ. A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless. Furthermore, it might be "essentially free" from a branch prediction point, but low and behold caches exist. You would pollute both the instruction cache with those instructions _and_ the branch prediction cache. From this it doesn't follow at all, that there is no cost.

In the end small things do add up, and if you're adding many little things "because it doesn't cost much nowadays" you will end up with slow software and not have one specific bottleneck to look at. I do agree that having the option for checked operations is nice (see C#), but I have needed this behavior (branching on overflow) exactly once so far.

> A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless.

You almost always want to change the behavior to erroring out on overflow. The few cases where overflow really is intended and fine can be handled by explicit opt-out.

And I refuse to buy the argument that "small things add up" in the world where we do string building and parsing every few microseconds. Checked math will have unnoticable impact compared to all the other things we do, in almost every type of program.

This string manipulation stuff is very common, and that's why in 2026, an age where science fiction has become a reality, many things are still absurdly slow. Exactly because of such sloppiness, which does accumulate in many cases, and when one least expected it.
100% agreed on the sloppiness. But overflow checking is not sloppiness. It's the opposite of sloppiness. Unchecked math is sloppiness, allowing overflows to happen silently and uncontrollably is sloppiness. It just so happens this kind of sloppiness makes code faster, unlike other kinds of sloppines that make code slower. Not doing necessary safety checks is faster than doing these necessary checks, but it doesn't make these checks any less necessary. Not validating user input also makes code faster, and is also sloppy.
Signed overflow checks are typically not free unfortunately they have a cost of about 5% or thereabouts
In hot paths it can be even more. This is why even Rust defines it as wrapping but elides the overflow panic in release builds.
It is defined as an error. That error’s default handling is wrapping when debug_assertions is off, and panic when it’s on, but since it’s an incorrect program (though not UB) either behavior is acceptable in any mode.
If it is defined as an error, but the compiled build will continue to run with the value wrapped around, I would say that's indistinguishable from UB.
Yeah on average. On some paths it's almost free
You can run your code under ASAN and UBSAN nowadays, it will catch many or most of issues as they happen.

But that's completely besides the point. UB on signed overflow, or really most of UB, is not unrelated to C flexibility. It is a detail of the spec related to portability and performance. IIRC it is even required to make such trivial optimizations as turning

    for (int i = 0; i < n; i++) func(a[i]);
into

    for (Foo *p = a, *last = a + n; p < last; p++) func(p);
saving arithmetics and saving a register, on architectures where `int` is smaller than pointers. But there is also options like -fwrapv on GCC for example, allowing you to actually use signed overflow.
How is undefined behavior necessary for this transformation?
IIRC computation of the address is done by computing offset from base pointer as a multiplication in (32-bit) int, (like p + (i * sizeof (Foo)). The right term might overflow, but due to signed overflow being UB, the compiler is able to assume that it does not, so the transformation to do the arithmetic entirely in (64-bit) pointer space is valid.
Exactly. You as the programmer know that the loop counter won't overflow, and in general, essentially nobody would actually write it that way. But if you don't assume it can't happen, the possibility for signed overflow is everywhere in address computations.

This is also a major blocker for auto-vectorization. Can't coalesce a load of a[i], a[i+1], a[i+2], a[i+3] into a load of a[i:i+3] if there's a possibility that `i+1`, `i+2` or `i+3` wrapped around (thus causing your "contiguous" load to be non-contiguous). This is a big reason why you shouldn't use `unsigned` for loop counters, especially if they're going to be used as an index into an address calculation.

But surely the more natural approach than making this undefined behavior would be making the computation of a[i] take place in 64-bit pointer space rather than 32-bit int space? Why does the compiler need the freedom to emit nasal demons?
*is not related to C flexibility
It's not flexible in practice, because knowing the standard isn't optional. If you make the choice to not follow the standard, you're making the choice to write fundamentally broken software. Sometimes with catastrophic consequences.
I'm making the choice to pass pointers as void to get low-friction polymorphism. I'm making the choice to control the memory layout of my data structures, including of levels and type of indirection. I'm making the choice to control my own memory allocators and closely control lifetimes, closely control (almost) everything that happens in the system.

That has nothing to do with not following the standard.

But be as you may you’re not following the standard.
what is your point?
If you don't follow the standard, gcc -O2 can introduce bugs to your code that you never even wrote. Skipping null checks, executing both branches of a conditional, and so on.
At which point it feels like some sort of high-level assembly-like language, which is simple enough to compile efficiently and stay crossplatform, with some primitives for calls, jumps, etc. could find a nice niche.

Maybe this already exists, even? A stripped down version of C? A more advanced LLVM IR? I feel like this is a problem that could use a resolution, just maybe not with enough of a scale for anyone to bother, vs. learning C, assembly of given architecture, or one of the new and fancy compiled languages.

There's Vale [0] as a structured high-level assembly language, but pretty far from usable right now. I do hope it matures. Basically: All non-control-flow instructions can be directly supported. Control flow is lofted to a higher level and implemented in C-style structured blocks and keywords, which map directly to a subset of the ISA that modifies the program counter. This separation means it's not a proper superset of traditional assembly languages -- you can't paste in arbitrary blocks of existing code -- but a lot of interesting things (for them, implementations of cryptographic primitives) are pretty trivial to port over. And in exchange, you get a well defined Hoare logic that can talk about total correctness, not just [1]'s partial correctness.

[0] https://github.com/project-everest/vale

[1] https://nickbenton.name/coqasm.pdf

Well, Zig is aiming to be a "saner C", and mostly succeeding so far. I hope they make it to production.

Rust is a somewhat more thorough attempt to actually course-correct.

It is basically what you can have today with Object Pascal or Modula-2, with a revamped syntax for C crowds.
Yes, there have been quite a few C inspired Assembly languages for DSPs for example, TI had one.