Hacker News new | ask | show | jobs
by tialaramex 1066 days ago
> All undefined behavior could become "implementation defined" tomorrow, where the C compiler becomes more like a high-level assembler (again), and you could still jump the instruction pointer into arbitrary program text.

Try to work this through in your head. Imagine how you need to specify the working of the abstract machine in order to allow this. How do we talk about an "instruction pointer" on the abstract machine? What are the instructions it's pointing to? Am I defining an entire bytecode VM?

Nah, instead you're going to do one of two things. One: "Undefined Behaviour" which we explicitly took off the table, or Two: "If this happens the program aborts". And with that the big problem evaporates. Does it make those C programmers happy? I expect not.

1 comments

Implementation defined means the compiler must specify the behavior, but it has near total freedom, and it can define it specific to the target system. There is no abstract machine. If I use GCC on Linux x86-64, then there very much is an instruction pointer.
In the real world, compilers just specify that the behaviour is undefined and tell you to suck it up. But we're talking about a hypothetical where we aren't allowing Undefined Behaviour. Saying "Oh, but we can if we say it's the implementation choosing" is a get out which is meaningless for the hypothetical. Just refuse to engage with the hypothetical instead if you don't like it.
I'm using specific, standards defined language, that's relatively well known. For example, sizeof(int) is implementation defined, meaning it must have a documented definition, specific to the implementation (e.g., gcc x86_64-linux-gnu, it's 4).

In languages like C that are closer to the machine, not everything has to be specified strictly in terms of a generic abstract machine.

I'm not trying to be hostile or evasive or derisive, I'm just genuinely responding to your original comment, that I think missed on some important info. And my point was that if we imagine a different world from the real world we're in right now, where in this new world, all undefined behavior became implementation defined behavior, then there would still be a need for mitigations like endbr64. So I'm not painting a rosy picture for C. I just think undefined behavior is a red herring. Assembly doesn't have undefined behavior, but obviously you can have all sorts of issues there.

> Assembly doesn't have undefined behavior, but obviously you can have all sorts of issues there.

The machine is in the real world and is thus obliged to have some actual behaviour, but it is not always practical to discern what that behaviour would be let alone make it reliable across a product line and document it in an understandable way. As a result actually your CPU's documentation does in effect include "Undefined Behaviour".

True, when writing my comment I wanted to qualify it to the same effect, but thought it would be an unnecessary subtlety to the general thrust of my point. That is, we can ignore this kind of "undefined behavior in the machine itself" for the purposes of this particular discussion.
I don't see how to ignore it though. If we're defining the behaviour but then our "definition" just doesn't specify the actual behaviour because it's specified in terms of hardware with no clearly defined behaviour for that situation then it's just word play, we're not really doing what I set out.