Hacker News new | ask | show | jobs
by kmeisthax 1033 days ago
Great, except no implementation of the C abstract machine actually exists. So you can't test against it. All you have are compilers that use it to justify miscompiling your code.

We need a C interpreter that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc. If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.

My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."

5 comments

> For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines.

This is indeed a design mistake, but in another sense. Ordinary arithmetic ops like + or - should throw an exception on overflow (with both signed and unsigned operands) because most of the times you need an ordinary math, not math modulo 2^32. For those rare cases where wrap around is desired, there should be a function like add_and_wrap() or a special operator.

UBSan covers each of those except provenance checking, and ASan mostly catches provenance problems even though that's not directly the goal. There are some dumb forms of UB not caught by any of the sanitizers, but most of them are.

Making your program UBSan-clean is the bare minimum you should do if you're writing C or C++ in 2023, not an absurd goal. I know it'll never happen, but I'm increasingly of the opinion that UBSan should be enabled by default.

> Great, except no implementation of the C abstract machine actually exists. So you can't test against it. All you have are compilers that use it to justify miscompiling your code.

All C compilers implement the C abstract machine. It is not used to justify miscompiling code, it is used to specify behavior of compiled code.

> We need a C interpreter

Interpreter or not is not relevant, there must be some misconception. Any behavior you can implement with an interpreter can be implemented with compiled code. E.g., add a test and branch after each integer operation if you want to crash on overflow.

> that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc.

As others have mentioned there are static and dynamic checkers (sanitizers) that test for such things nowadays. In compiled, not interpreted code, mind you.

> If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.

It's not that bad.

> My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."

The spec uses implementation defined behavior for that. Although you can argue that they went the wrong way on some choices -- signed integer overflow "depends on the machine at hand" in the first K&R, which you could say would be reasonable to call it implementation specific and enumerate the behaviors of supported machines.

C had a long history with hardware manufacturers, compiler writers, and software developers though, so the standard can never universally please everybody. The purpose of standardization was never to make something that was easiset for software development, ignoring the other considerations. So a decision is not an example of design by committee gone wrong just because happened to be worse for software writers (e.g., choosing to make overflow undefined instead of implementation dependent). You would have to know why such a decision was made.

The blog post company does sell a C interpreter that checks for all undefined behaviors (with provenance and offset).
The general problem with this argument is that “do what the hardware does” is actually not easy to reason about. The end results of this typically are impossible to grok.
Not if possible implementations are specified, and especially if you can target machines of particular behavior. Which is of course how you can write endian and bit size portable code.