Hacker News new | ask | show | jobs
by Dylan16807 38 days ago
Compilers are not able to prevent you from violating must/shall in the general case. So they're not held to that bar. Unless the standard says not to compile it, it's not a compiler bug.

Also, imagine a situation where the line of code actually lists three different variables, but all three of them are passed in by address. It quickly becomes impossible for the compiler to know you violated the spec by reusing the same variable. And even optimizations that make sense here could corrupt the value pretty badly and possibly lead to worse errors.

1 comments

> Also, imagine a situation where the line of code actually lists three different variables, but all three of them are passed in by address. It quickly becomes impossible for the compiler to know you violated the spec by reusing the same variable.

OK. What is the value of a spec to which compliance is impossible?

The compiler does comply to the spec. It's the program that fails to comply with the spec. It's definitely possible to write programs that have no undefined behavior.
The compiler is supposed to compile programs that comply with the spec, and not compile programs that don't.

The concept of "compiling a program that doesn't comply with the spec" doesn't even exist! A text file that doesn't comply with the C spec isn't a C program. That's what it means to be "the spec".

No, there's a fun C++ talk - by I want to say Chandler Carruth - in which the speaker points out that C++ is a language defined to have false positives for the question "Is this a valid program?"

The mechanism in the ISO document is phrases of the form "Ill-formed, No Diagnostic Required" which is often shortened to IFNDR. Lets break that down. "Ill-formed" means this is not a valid C++ program. On its own that means the compiler should provide a diagnostic (an error messag) explaining that your program isn't valid. For example if the program text were to just consist of the word "fuck" that's ill-formed and will be diagnosed. "No Diagnostic Required" says in this case though, we don't require the compiler to report this problem.

Why do that? So originally there's a purely practical reason, but ultimately there's a philosophical one. C++ like C before it wants to translate many individual program files and then somehow cobble the resulting output into a single executable. So this means function A over here, using type T from a different file cannot know for sure about type T, instead C++ has a thing called the "One Definition Rule" which says you must somewhat define T each time it's needed, but all the definitions must be the same. What if you don't (by mistake or on purpose)? Well that will cause chaos, so, IFNDR.

Philosophically IFNDR is a way to resolve the dilemma from Rice's Theorem. Back in about 1950 this guy named Henry Rice got his PhD for proving that any non-trivial semantic property of a program is Undecidable. This isn't "Oh no, it's quite hard to do this" it's a straight up mathematical proof that it can't be done. Deciding reliably whether a program has any† semantic property isn't possible. Sometimes we're sure, and that's fine, but the dilemma is for the tricky cases: What do we do when we're not sure?

IFNDR is C++ choosing "Fuck it, it's fine" for this case. Maybe your program is nonsense, it might do absolutely anything, but you don't get even a warning from the compiler. This is Chandler's "false positive".

Rust chooses the opposite. When the compiler can't see why your program is sense it will be rejected, even if you and a room full of compiler experts agree it should work too bad, it doesn't compile. You get a diagnostic explaining why your program was rejected.

† Trivial means either all programs have the property or none do and so isn't interesting. As a result the restriction to "non-trivial" properties isn't much help.

> The concept of "compiling a program that doesn't comply with the spec" doesn't even exist!

Wrong. Lots of spec violations only happen at runtime and can't be predicted at compile time.

Here's an easy example. You're the compiler. I hand you what appears to be valid C code that allocates an array and then asks the user which slot to use. It doesn't verify the slot is in bounds, just puts a number in array[slot], does some math with it, and then prints the result. Does my program comply with the spec? Do you compile it?

This is "Wrong" in the sense that C++ does really work like this, but it's not wrong in the sense that this is somehow unavoidably the case.

For example if you attempt an equivalent mistake in WUFFS that will be rejected.

Your WUFFS compiler will say this variable named slot must be a non-negative integer smaller than the length of the array, but as far as it can tell you didn't ensure that was true, therefore this code is nonsense, do better.

As I explained in my sister reply, in a broader context some of these are semantic properties and so there's a dilemma and C++ chooses to resolve that dilemma by accepting nonsense programs, but that wasn't the only available resolution and I am confident it's the wrong choice.

> The compiler is supposed to compile programs that comply with the spec

Yes.

> and not compile programs that don't.

No, there is no limitation on what a compiler does in this case.

> The concept of "compiling a program that doesn't comply with the spec" doesn't even exist!

It does, it is called "undefined behaviour".

> A text file that doesn't comply with the C spec isn't a C program.

That's the point. A program that contains UB is not a valid C program. That's what UB means.

That may be how you want C to be specified, but that's not how C is specified. The compiler is always allowed to assume that the input program is free of undefined behaviors.

C++ has an even more trivial example: To be a valid C++ program, all loops without side-effects must terminate. Thus determining if some C++ programs are valid would require solving the halting problem.

> OK. What is the value of a spec to which compliance is impossible?

It lets you tell people you have a spec? It makes it easy for compiler developers to dismiss bug reports with "your code violated the spec"?

Welcome to C.

But more seriously it's the job of the program to not do undefined things.

It's the job of a language designer to define everything.
C should do better about the things that could be readily defined, but there's no way to have arbitrary pointers and define everything.
> but there's no way to have arbitrary pointers and define everything.

What's the undefined behavior in assembly?

Assembly is kind of at the crossroads of everything being defined and nothing being defined, when you consider things like writing random data to memory and executing it... But anyway here's the first thing I found to answer that: https://news.ycombinator.com/item?id=9578178

Probably more important, way too many things in assembly vary by exact model. Can you name a portable language that fits those criteria?

>What is the value of a spec to which compliance is impossible?

Are you saying, what's the value of a language spec that allows undefined behavior, as C does?

Well, it's that it allows for compiler implementations that aren't too hard to implement and maintain.

It allows for a language that's close enough to hardware (and allows you to do programming on a low level), while still offering a reasonable amount of abstraction to be useful (and usable).

It's also difficult to define a formal system that won't have undefined expressions. Mathematics itself is full of them (in logic, "this sentence is a lie" has no truth value; you can't define the set of all sets, or a set of sets that don't contain themselves; etc).

That said, I think we've settled on a rather silly choice here with the "++" operator.

Personally, I'd do away with the ++ operator in either pre- or post- increment forms, or at least disallow it in arithmetic expressions.

The only thing having it realistically accomplished is saving a few characters when writing a for-loop in C.

Even for that it's not necessary.

The problem with it is that, unlike normal arithmetic operators, it both returns a value and assigns one, which means that you can assign values to several variables in a single arithmetic expression, as in

     a = b++;
...which C, in general, allows, as in:

     a = (b = b + 1);
The result of these two expressions, of course, is different.

Now, I have the following religious belief, and it's that arithmetic operators shouldn't have side effects. That's to say, assignment and evaluation should be separate.

So that when I write

     x = (arithmetic);
..I could be sure that the only outcome of this computation is changing the value of x.

Perhaps calling the function sqrt(x) would summon Cthulhu — I'll read the documentation for it to be sure. But in general, I'd hope that calling abs(x) wouldn't change the value of x to |x| in addition to returning it.

But K&R decided to have fun by saying that "x = 5;" is both an assignment and an expression with a value. Which allows one to write:

      x = y = z = 5;
as a parlor trick.

That's it, that's the only utility.

Instead of defining this as a special initialization syntax and otherwise disallowing it (as Pyhthon does), they went YOLO and made assignment an expression rather than a mere statement.

Which means that the very useful statement "increase the value of this variable by one" became two expressions with different values.

In an ideal world, the following would be equivalent, and would not evaluate to anything you can assign to a variable:

     ++x;
     x += 1;
     x = x + 1;
...while "x++" would not exist at all (or would be equivalent to ++x).

And that's how it is in Go. Thompson fixed the design mistake after 4-5 decades of it giving everyone headaches.

Sadly, C++, Java, C# all wanted to be "like C" in basic syntax, so we're stuck with puzzles like this to this day.

TL;DR: if you're asking "what's the value of the spec that makes assignment an expression", i.e. why is making "a = (b = c + d);" valid syntax a good idea, the answer is:

It isn't. It's a bad decision made in 1970s that modern languages like Go no longer support.

> Well, it's that it allows for compiler implementations that aren't too hard to implement and maintain.

> It allows for a language that's close enough to hardware (and allows you to do programming on a low level), while still offering a reasonable amount of abstraction to be useful (and usable).

I can see the first of these. The second appears to be untrue; if you removed the concept of undefined behavior from C, it wouldn't get farther away from the hardware.

Is that first point actually something that somebody wants? Who benefits from the idea that it's easy to write a "standards-compliant" compiler, because you are technically "standards-compliant" whether you comply with the standard or not?

At that point, you've given up on having a standard, and the interviewers Susam calls out, who say that the correct answer is whatever their compiler says it is, are correct in fact. Susam is the one who's wrong, for reading the standard.

You can run a language that way just fine. I had the impression that Perl was defined by a reference implementation. But it's the opposite of having a standard.

>The second appears to be untrue; if you removed the concept of undefined behavior from C, it wouldn't get farther away from the hardware

My understanding is that even common CPU instruction sets can have undefined behavior[1].

When C was written, the CPU architectures were more of a Wild West. It might have made sense to leave some parts up to the compiler authors on a particular architecture.

>Is that first point actually something that somebody wants?

When C was written — absolutely.

Portability of C code is almost taken for granted these days.

Things were different then. Portability was a big challenge.

All that said, this is my non-authoritative understanding of the reasons why it's a thing. Take it with a grain of salt.

>At that point, you've given up on having a standard

Sure. Just treat C as a family of languages which have a common standardized part.

Proprietary compiler extensions are/were common anyway, so that's not an unusual situation.

[1] https://www.os2museum.com/wp/undefined-isnt-unpredictable/

Assigning to multiple variables in a single expression is fine and useful. Take

``` target[i++] = source1[j++] + source2[k++]; ``` That's idiomatic, it shows the intent to read and consume the value in a single expression. You can write it longer, but not more clearly.

It's only when you assign to the same variable multiple times, or read it after it was assigned, that it introduces ordering issues.

A single `i++` or `++i`/`i += 1` is safe and useful.

>A single `i++` or `++i`/`i += 1` is safe and useful

Sure, and you don't need the assignment to be an expression with a value for it to be useful.

>target[i++] = source1[j++] + source2[k++]; That's idiomatic

That's idiomatic to C for sure.

Also idiomatically horrible. Why are you using three index variables here?

>You can write it longer, but not more clearly.

    target[i] = source1[i] + source2[i];

    i++;
This is absolutely more clear to any sane person, and less prone to error.

You can't forget to increase one if the indices when all three are meant to go in lockstep.

It's longer by one semicolon, and requires far less cognitive overload to parse.

There's a reason why they did away with it in Go. What do you think that reason was if it's so useful?

> Why are you using three index variables here?

> You can't forget to increase one if the indices when all three are meant to go in lockstep.

Obviously they are not in this example.

The next line might contain:

    i++; j *= 42; k = srandom (k), random ();
>Obviously they are not in this example

...of utter insanity which doesn't belong in any real world code.

It just keeps getting worse, and shows why it was a horrible idea to allow this in the first place.

>The next line might contain:

    i++; j *= 42; k = srandom (k), random ();
Then that's where you do the arithmetic.

You're already doing it there, why do you need to do an assignment inside the brackets?

(This was a rhetorical question. You don't).