| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by LPisGood 42 days ago
	I don’t work a ton with C, but I wonder how C programmers keep track of what behavior is and is not defined. It seems like there are many possible edge cases.

4 comments

wat10000 42 days ago

We get by on a combination of matching patterns (any pointer cast gets a lot of scrutiny, for example), compiler warnings, tools like UBSan, debugging when things go wrong, and sheer dumb luck.

Having an understanding of how the code gets transformed into machine code helps. For this case, there's the basic idea that `a++` will boil down to three basic conceptual operations: fetch, add, and store, and those can be potentially interleaved with other parts of the statement. In something like `a++ + ++b` the interleaving doesn't affect the outcome no matter how it's done. In `a++ + ++b` the interleaving can affect the outcome, and that's your sign that something might be wrong.

Any memory safety issue in C code had to involve UB at some point. And you can see how prevalent those are, and deduce how not-particularly-great we are at keeping track of UB.

link

MaxBarraclough 42 days ago

> Having an understanding of how the code gets transformed into machine code helps

I'm not sure about that. Knowing assembly is not a substitute for knowing how the language is defined. Sometimes C/C++ programmers with some assembly knowledge reason themselves into thinking that what they're asking of the language must have well-defined behaviour, when in fact it's undefined behaviour. It doesn't really matter whether interleaving order can change the output. (++i)++ is, apparently [0], undefined behaviour in C but has well defined behaviour in C++.

[0] https://stackoverflow.com/a/58841107

link

wat10000 41 days ago

I don't mean assembly in this case, but something more like the compiler's view of the code. a++ can be broken down into more primitive operations, and might actually be, depending on how the compiler is implemented. The fact that the ordering of those more primitive operations with respect to other operations isn't very tightly constrained is something you'd just have to know about the language, I suppose.

link

MaxBarraclough 41 days ago

> The fact that the ordering of those more primitive operations with respect to other operations isn't very tightly constrained is something you'd just have to know about the language, I suppose.

No, that's not right. It's undefined behaviour, not merely an unspecified order of evaluation. Roughly speaking, the behaviour of the entire program is unconstrained by the language standard after execution of that statement. It could crash the whole process, for instance, or go haywire.

(Again, that's in C, apparently, but not in C++.)

link

wat10000 40 days ago

It's worse than that, the behavior of the entire program is unconstrained by the language standard beforehand too. Raymond Chen discusses how things can go wrong once you're going to reach UB even before you get to it: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

Anyway, I didn't mean to imply that things behaved as written aside from ordering issues. I only meant that this sort of principle can help you remember where UB lurks. Generally, where a kind C compiler might just mess with your numbers a bit, an evil C compiler can legally make demons fly out of your nose.

link

MaxBarraclough 40 days ago

> It's worse than that, the behavior of the entire program is unconstrained by the language standard beforehand too. Raymond Chen discusses how things can go wrong once you're going to reach UB even before you get to it

Heh, yes that's exactly what I was thinking when I put roughly speaking.

> where a kind C compiler might just mess with your numbers a bit, an evil C compiler can legally make demons fly out of your nose

Yes, signed integer overflow being another. Presumably it's defined that way as it's simpler than trying to spell out all the behaviours the compiler is permitted to implement, and on top of that there are trap representations to worry about. I doubt modern compilers get much optimization benefit from it though. There's a StackOverflow thread discussing the reasons it's defined this way: https://stackoverflow.com/q/1860461

link

lmm 41 days ago

They don't. In the culture some kinds of undefined behaviour are taken seriously and some aren't. If you want to write code that "works", you emulate what popular performance benchmarks do (whether their code is undefined according to the standard or not), since those are the thing that C compiler developers actually care about.

link

1718627440 41 days ago

Personally, when ever I write a modifying statement, I wonder about the domains of the input and ensure, that the condition necessary to stay in the existing range is evaluated. If it is not, I either write the condition, reduce the input domain, or increase the output domain.

link

IshKebab 42 days ago

They don't really. In fact there are many things that are technically UB but are so common that compilers can't really treat them as UB. E.g. type punning via unions.

link

el_pollo_diablo 42 days ago

Type punning via unions is not UB in C in general, but it is in C++ IIRC.

I write "in general" because, as with other forms of memory reinterpretation (memcpy or copy through a character type), evaluating a trap representation triggers UB.

link

Chaosvex 41 days ago

The short version is that it's fine in C++ as long as you only read the member that was last written to or a char type.

link

1718627440 41 days ago

And a slightly longer version is, that there are three types involved: the type of access, the effective type of the object[0], and the type of the variable. The type of the variable is only for the compiler to emit warnings, as long as the effective type and the type of access are equal, it isn't UB.

[0] the C meaning of an object, not the C++ one

link

IcyWindows 41 days ago

Yeah, undefined behavior just means not defined in the specification.

I would argue that most languages only have one compiler so it doesn't matter what is in the specification.

link