| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simonask 25 days ago
	I think the article's point is that you don't actually have to get weird at all to run into UB. Lots of people mistakenly think that C and C++ are "really flexible" because they let you do "what you want". The truth of the matter is that almost every fancy, powerful thing you think you can do is an absolute minefield of UB.

4 comments

kzrdude 25 days ago

My go-to example of "UB is everywhere" is this one:

    int increment(int x) {
        return x + 1;
    }

Which is UB for certain values of x.

link

CodeArtisan 25 days ago

C23 removed the whole stuff about indeterminate value and trap representation. Underflow/overflow being silent or not is implementation defined.

link

saagarjha 25 days ago

Signed overflow is just undefined.

link

jstimpfle 24 days ago

TBF that is the same as saying "signed overflow is UB".

link

kzrdude 24 days ago

yes but it is a 'picture' that makes you think about it in a different way.

link

saghm 25 days ago

I've long said that the value a programming language offers is as much about what it doesn't allow as what it does allow. Efficiency aside, most useful programs could be written in most languages, but there are an infinite number of programs you could write that aren't particularly useful. Ruling out the programs you might accidentally write that resemble the one you intended is a pretty useful feature of a language, and it's a metric that C and C++ rate quite poorly on IMO.

link

jstimpfle 25 days ago

I would agree that C is "really flexible", but I would say it's primarily flexible because it lets you cast say from a void pointer to a typed pointer without requiring much boilerplate. It's also flexible because it lets you control memory layout and resource management patterns quite closely.

If you want to be standards correct, yes you have to know the standard well. True. And you can always slip, and learn another gotcha. Also true. But it's still extremely flexible.

link

crote 25 days ago

The problem is that a lot of the flexibility introduced by UB doesn't serve the developer.

Take signed integer overflow, for example. Making it UB might've made sense in the 1970s when PDP-1 owners would've started a fight over having to do an expensive check on every single addition. But it's 2026 now. Everyone settled on two's complement, and with speculative execution the check is basically free anyways. Leaving it UB serves no practical purpose, other than letting the compiler developer skip having to add a check for obscure weird legacy architectures. Literally all it does is serve as a footgun allowing over-eager optimizations to blow up your program.

Although often a source of bugs, C's low-level memory management is indeed a great source of flexibility with lots of useful applications. It's all the other weird little UB things which are the problem. As the article title already states: writing C means you are constantly making use of UB without even realizing it - and that's a problem.

link

ablob 25 days ago

If we're talking two's complement it's not undefined that is right. Having to emit checks though, that is where I beg to differ. A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless. Furthermore, it might be "essentially free" from a branch prediction point, but low and behold caches exist. You would pollute both the instruction cache with those instructions _and_ the branch prediction cache. From this it doesn't follow at all, that there is no cost.

In the end small things do add up, and if you're adding many little things "because it doesn't cost much nowadays" you will end up with slow software and not have one specific bottleneck to look at. I do agree that having the option for checked operations is nice (see C#), but I have needed this behavior (branching on overflow) exactly once so far.

link

Xirdus 25 days ago

> A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless.

You almost always want to change the behavior to erroring out on overflow. The few cases where overflow really is intended and fine can be handled by explicit opt-out.

And I refuse to buy the argument that "small things add up" in the world where we do string building and parsing every few microseconds. Checked math will have unnoticable impact compared to all the other things we do, in almost every type of program.

link

jstimpfle 25 days ago

This string manipulation stuff is very common, and that's why in 2026, an age where science fiction has become a reality, many things are still absurdly slow. Exactly because of such sloppiness, which does accumulate in many cases, and when one least expected it.

link

Xirdus 25 days ago

100% agreed on the sloppiness. But overflow checking is not sloppiness. It's the opposite of sloppiness. Unchecked math is sloppiness, allowing overflows to happen silently and uncontrollably is sloppiness. It just so happens this kind of sloppiness makes code faster, unlike other kinds of sloppines that make code slower. Not doing necessary safety checks is faster than doing these necessary checks, but it doesn't make these checks any less necessary. Not validating user input also makes code faster, and is also sloppy.

link

saagarjha 25 days ago

Signed overflow checks are typically not free unfortunately they have a cost of about 5% or thereabouts

link

vlovich123 25 days ago

In hot paths it can be even more. This is why even Rust defines it as wrapping but elides the overflow panic in release builds.

link

steveklabnik 25 days ago

It is defined as an error. That error’s default handling is wrapping when debug_assertions is off, and panic when it’s on, but since it’s an incorrect program (though not UB) either behavior is acceptable in any mode.

link

jstimpfle 25 days ago

If it is defined as an error, but the compiled build will continue to run with the value wrapped around, I would say that's indistinguishable from UB.

link

saagarjha 24 days ago

Yeah on average. On some paths it's almost free

link

jstimpfle 25 days ago

You can run your code under ASAN and UBSAN nowadays, it will catch many or most of issues as they happen.

But that's completely besides the point. UB on signed overflow, or really most of UB, is not unrelated to C flexibility. It is a detail of the spec related to portability and performance. IIRC it is even required to make such trivial optimizations as turning

    for (int i = 0; i < n; i++) func(a[i]);

into

    for (Foo *p = a, *last = a + n; p < last; p++) func(p);

saving arithmetics and saving a register, on architectures where `int` is smaller than pointers. But there is also options like -fwrapv on GCC for example, allowing you to actually use signed overflow.

link

Chinjut 25 days ago

How is undefined behavior necessary for this transformation?

link

jstimpfle 25 days ago

IIRC computation of the address is done by computing offset from base pointer as a multiplication in (32-bit) int, (like p + (i * sizeof (Foo)). The right term might overflow, but due to signed overflow being UB, the compiler is able to assume that it does not, so the transformation to do the arithmetic entirely in (64-bit) pointer space is valid.

link

tyg13 25 days ago

Exactly. You as the programmer know that the loop counter won't overflow, and in general, essentially nobody would actually write it that way. But if you don't assume it can't happen, the possibility for signed overflow is everywhere in address computations.

This is also a major blocker for auto-vectorization. Can't coalesce a load of a[i], a[i+1], a[i+2], a[i+3] into a load of a[i:i+3] if there's a possibility that `i+1`, `i+2` or `i+3` wrapped around (thus causing your "contiguous" load to be non-contiguous). This is a big reason why you shouldn't use `unsigned` for loop counters, especially if they're going to be used as an index into an address calculation.

link

Chinjut 24 days ago

But surely the more natural approach than making this undefined behavior would be making the computation of a[i] take place in 64-bit pointer space rather than 32-bit int space? Why does the compiler need the freedom to emit nasal demons?

link

jstimpfle 25 days ago

*is not related to C flexibility

link

simonask 25 days ago

It's not flexible in practice, because knowing the standard isn't optional. If you make the choice to not follow the standard, you're making the choice to write fundamentally broken software. Sometimes with catastrophic consequences.

link

jstimpfle 25 days ago

I'm making the choice to pass pointers as void to get low-friction polymorphism. I'm making the choice to control the memory layout of my data structures, including of levels and type of indirection. I'm making the choice to control my own memory allocators and closely control lifetimes, closely control (almost) everything that happens in the system.

That has nothing to do with not following the standard.

link

saagarjha 25 days ago

But be as you may you’re not following the standard.

link

jstimpfle 25 days ago

what is your point?

link

Xirdus 25 days ago

If you don't follow the standard, gcc -O2 can introduce bugs to your code that you never even wrote. Skipping null checks, executing both branches of a conditional, and so on.

link

3form 25 days ago

At which point it feels like some sort of high-level assembly-like language, which is simple enough to compile efficiently and stay crossplatform, with some primitives for calls, jumps, etc. could find a nice niche.

Maybe this already exists, even? A stripped down version of C? A more advanced LLVM IR? I feel like this is a problem that could use a resolution, just maybe not with enough of a scale for anyone to bother, vs. learning C, assembly of given architecture, or one of the new and fancy compiled languages.

link

addaon 25 days ago

There's Vale [0] as a structured high-level assembly language, but pretty far from usable right now. I do hope it matures. Basically: All non-control-flow instructions can be directly supported. Control flow is lofted to a higher level and implemented in C-style structured blocks and keywords, which map directly to a subset of the ISA that modifies the program counter. This separation means it's not a proper superset of traditional assembly languages -- you can't paste in arbitrary blocks of existing code -- but a lot of interesting things (for them, implementations of cryptographic primitives) are pretty trivial to port over. And in exchange, you get a well defined Hoare logic that can talk about total correctness, not just [1]'s partial correctness.

[0] https://github.com/project-everest/vale

[1] https://nickbenton.name/coqasm.pdf

link

simonask 25 days ago

Well, Zig is aiming to be a "saner C", and mostly succeeding so far. I hope they make it to production.

Rust is a somewhat more thorough attempt to actually course-correct.

link

pjmlp 25 days ago

It is basically what you can have today with Object Pascal or Modula-2, with a revamped syntax for C crowds.

link

pjmlp 25 days ago

Yes, there have been quite a few C inspired Assembly languages for DSPs for example, TI had one.

link