Hacker News new | ask | show | jobs
by petergeoghegan 1031 days ago
> No. This is what I call the "portable assembler"-understanding of undefined behavior and it is entirely false.

"C has been characterized (both admiringly and invidiously) as a portable assembly language" - Dennis Ritchie

The idea of C as a portable assembler is not without its problems, to be sure -- it is an oxymoron at worst, and a squishy idea at best. But the tendency of compiler people to refuse to take the idea seriously, even for a second, just seems odd. The Linux kernel's memory-barriers.txt famously starts out by saying:

"Some doubts may be resolved by referring to the formal memory consistency model and related documentation at tools/memory-model/. Nevertheless, even this memory model should be viewed as the collective opinion of its maintainers rather than as an infallible oracle."

Isn't that consistent with the general idea of a portable assembler?

> I agree that undefined behavior is a silly concept but that's the fault of the standard, not of compilers.

The people that work on compilers have significant overlap with the people that work on the standard. They certainly seem to share the same culture.

2 comments

> But the tendency of compiler people to refuse to take the idea seriously, even for a second, just seems odd.

It's not taken seriously because it shouldn't be taken seriously. It's a profoundly ignorant idea that's entirely delusional about reality. Architectures differ in ways that are much more profound than how parameters go on the stack or what arguments instructions take. As a matter of fact the C standard bends over backwards in the attempt of not specifying a memory model.

Any language that takes itself seriously is defined in terms of its abstract machine. The only alternative is the Perl way: "the interpreter is the specification", and I don't see how that's any better.

> It's not taken seriously because it shouldn't be taken seriously

I really don't know what you're arguing against. I never questioned the general usefulness of an abstract machine. I merely pointed out that a large amount of important C code exists that is in tension with the idea that of an all important abstract machine. This is an empirical fact. Is it not?

You are free to interpret this body of C code as "not true ISO C", I suppose. Kind of like how the C standard is free to remove integer overflow checks in the presence of undefined behavior.

> As a matter of fact the C standard bends over backwards in the attempt of not specifying a memory model.

I mean, C explicitly specifies a memory model and has since C11

I wonder what's the best solution here then. A different language that actually is portable assembly, or has less undefined behaviour or simpler semantics (e.g RIIR), or making -O0 behave as portable assembly?
Step 1: Define just what "portable assembly" actually means.

An assembly program specifies a sequence of CPU instructions. You can't do that in a higher-level language.

Perhaps you could define a C-like language with a more straightforward abstract machine. What would such a language say about the behavior of integer overflow, or dereferencing a null pointer, or writing outside the bounds of an array object?

You could resolve some of those things by adding mandatory run-time checks, but then you have a language that's at a higher level than C.

> Perhaps you could define a C-like language with a more straightforward abstract machine. What would such a language say about the behavior of integer overflow

Whatever the CPU does. Eg, on x86, twos complement.

> or dereferencing a null pointer

Whatever the CPU does. Eg, on X86/Linux in userspace, it segfaults 100% predictably.

> or writing outside the bounds of an array object?

Whatever the CPU does. Eg, on X86/Linux, write to whatever is next in memory, or segfault.

> You could resolve some of those things by adding mandatory run-time checks, but then you have a language that's at a higher level than C.

No checks needed. Since we're talking about "portable assembly", we're talking about translating to assembly in the most direct manner possible. So dereferencing a NULL pointer literally reads from address 0x0.

> What would such a language say about the behavior of integer overflow

Two's complement (i.e. the result which is equivalent to the mathematical answer modulo 2^{width})

> dereferencing a null pointer

A load/store instruction to address zero.

> writing outside the bounds of an array object

A store instruction to the corresponding address. It's possible this could overwrite something important on the stack like a return address, in which case the compiler doesn't have to work around this (though if the compiler detects this statically, it should complain rather than treating it as unreachable)

The reason not to define these things is exactly so C can be used as a high-level assembler, and the answer is always “whatever it is that the CPU naturally does”

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p10, line 39

"C code can be portable. "

line 30

That's an interesting opinion.

But it has very little to do with the C programming language.

> The idea of C as a portable assembler is not without its problems

The main problem is that C is not a "portable assembler". You mainly argue that it should be, but it simply isn't (and hasn't been for a long time if it ever was).

> The people that work on compilers have significant overlap with the people that work on the standard. They certainly seem to share the same culture.

Isn't that beside the point? If you want C to be a "portable assembler" you have to write a standard that specifies its behavior. The compilers will then follow.