| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amelius 42 days ago
	You might still make a mistake, even if you think you know the answer. It's much better to instrument the code to figure it out, or write a short test program.

1 comments

bestouff 42 days ago

It's Undefined Behavior. So you can instrument all you want, the answer will still be wrong. You'll capture what your particular compiler does under some particular conditions (opt flags, surrounding code, etc.) but that will not be representative of what can happen in the general case (hint : anything can happen with UB).

thaumasiotes 42 days ago

> It's Undefined Behavior.

Susam's post doesn't make this clear. The quotes from K&R say that the modifications to the variable may take place in any order, but they don't directly say that doing this is Undefined Behavior, which would make it permissible to do anything, including e.g. interpreting the increments as decrements.

The C99 standard is quoted saying this:

>> Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.

It's possible that something else in the standard defines noncompliance with this clause as Undefined Behavior. But that's not the most intuitive interpretation; what this seems to say, to me, is that the line of code `a = a++ + ++a` should fail to compile, because it's not in compliance with a requirement of the language. Compilers that produce any result at all are suffering from a bug.

(It seems more likely that the actual intent is to specify that, given the line of code `b = a++ + ++a`, with a initially equal to 5, the compiler is required to ensure that the value stored at the address of a is never equal to 6 - that it begins at 5, and at some indefinite point it becomes 7, but that there is no intermediate stage between them. But I find the 'compiler failure on attempt to put multiple modifications between two sequence points' interpretation preferable.)

vilhelm_s 42 days ago

The "shall" in the standard means it's undefined behavior. This is explained in the "Conformance" section,

> 2. If a ‘‘shall’’ or ‘‘shall not’’requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.

Compilers will not refuse to compile the code, indeed the blog post we are all commenting on reports the results from a bunch of different compilers. Historically the reason the C standard specified a lot of undefined behavior is that the actually existing C compilers at the time compiled the code but disagreed about the output.

thaumasiotes 41 days ago

> Compilers will not refuse to compile the code, indeed the blog post we are all commenting on reports the results from a bunch of different compilers.

Yes, I see that. I just said they should refuse.

masklinn 41 days ago

Because this specific UB is static (not usually the case) both gcc and clang will flag it if Wsequence-point is enabled (and it is part of Wall) (technically the clang warning is Wunsequenced but aliased to the GCC version).

edit: apparently Wunsequenced is enabled by default so clang should warn you out of the box.

Dylan16807 42 days ago

Compilers are not able to prevent you from violating must/shall in the general case. So they're not held to that bar. Unless the standard says not to compile it, it's not a compiler bug.

Also, imagine a situation where the line of code actually lists three different variables, but all three of them are passed in by address. It quickly becomes impossible for the compiler to know you violated the spec by reusing the same variable. And even optimizations that make sense here could corrupt the value pretty badly and possibly lead to worse errors.

thaumasiotes 42 days ago

> Also, imagine a situation where the line of code actually lists three different variables, but all three of them are passed in by address. It quickly becomes impossible for the compiler to know you violated the spec by reusing the same variable.

OK. What is the value of a spec to which compliance is impossible?

aidenn0 42 days ago

The compiler does comply to the spec. It's the program that fails to comply with the spec. It's definitely possible to write programs that have no undefined behavior.

thaumasiotes 41 days ago

The compiler is supposed to compile programs that comply with the spec, and not compile programs that don't.

The concept of "compiling a program that doesn't comply with the spec" doesn't even exist! A text file that doesn't comply with the C spec isn't a C program. That's what it means to be "the spec".

lmm 41 days ago

> OK. What is the value of a spec to which compliance is impossible?

It lets you tell people you have a spec? It makes it easy for compiler developers to dismiss bug reports with "your code violated the spec"?

Dylan16807 42 days ago

Welcome to C.

But more seriously it's the job of the program to not do undefined things.

lupire 41 days ago

It's the job of a language designer to define everything.

alterom 41 days ago

>What is the value of a spec to which compliance is impossible?

Are you saying, what's the value of a language spec that allows undefined behavior, as C does?

Well, it's that it allows for compiler implementations that aren't too hard to implement and maintain.

It allows for a language that's close enough to hardware (and allows you to do programming on a low level), while still offering a reasonable amount of abstraction to be useful (and usable).

It's also difficult to define a formal system that won't have undefined expressions. Mathematics itself is full of them (in logic, "this sentence is a lie" has no truth value; you can't define the set of all sets, or a set of sets that don't contain themselves; etc).

That said, I think we've settled on a rather silly choice here with the "++" operator.

Personally, I'd do away with the ++ operator in either pre- or post- increment forms, or at least disallow it in arithmetic expressions.

The only thing having it realistically accomplished is saving a few characters when writing a for-loop in C.

Even for that it's not necessary.

The problem with it is that, unlike normal arithmetic operators, it both returns a value and assigns one, which means that you can assign values to several variables in a single arithmetic expression, as in

     a = b++;

...which C, in general, allows, as in:

     a = (b = b + 1);

The result of these two expressions, of course, is different.

Now, I have the following religious belief, and it's that arithmetic operators shouldn't have side effects. That's to say, assignment and evaluation should be separate.

So that when I write

     x = (arithmetic);

..I could be sure that the only outcome of this computation is changing the value of x.

Perhaps calling the function sqrt(x) would summon Cthulhu — I'll read the documentation for it to be sure. But in general, I'd hope that calling abs(x) wouldn't change the value of x to |x| in addition to returning it.

But K&R decided to have fun by saying that "x = 5;" is both an assignment and an expression with a value. Which allows one to write:

      x = y = z = 5;

as a parlor trick.

That's it, that's the only utility.

Instead of defining this as a special initialization syntax and otherwise disallowing it (as Pyhthon does), they went YOLO and made assignment an expression rather than a mere statement.

Which means that the very useful statement "increase the value of this variable by one" became two expressions with different values.

In an ideal world, the following would be equivalent, and would not evaluate to anything you can assign to a variable:

     ++x;
     x += 1;
     x = x + 1;

...while "x++" would not exist at all (or would be equivalent to ++x).

And that's how it is in Go. Thompson fixed the design mistake after 4-5 decades of it giving everyone headaches.

Sadly, C++, Java, C# all wanted to be "like C" in basic syntax, so we're stuck with puzzles like this to this day.

TL;DR: if you're asking "what's the value of the spec that makes assignment an expression", i.e. why is making "a = (b = c + d);" valid syntax a good idea, the answer is:

It isn't. It's a bad decision made in 1970s that modern languages like Go no longer support.

thaumasiotes 41 days ago

> Well, it's that it allows for compiler implementations that aren't too hard to implement and maintain.

> It allows for a language that's close enough to hardware (and allows you to do programming on a low level), while still offering a reasonable amount of abstraction to be useful (and usable).

I can see the first of these. The second appears to be untrue; if you removed the concept of undefined behavior from C, it wouldn't get farther away from the hardware.

Is that first point actually something that somebody wants? Who benefits from the idea that it's easy to write a "standards-compliant" compiler, because you are technically "standards-compliant" whether you comply with the standard or not?

At that point, you've given up on having a standard, and the interviewers Susam calls out, who say that the correct answer is whatever their compiler says it is, are correct in fact. Susam is the one who's wrong, for reading the standard.

You can run a language that way just fine. I had the impression that Perl was defined by a reference implementation. But it's the opposite of having a standard.

reichstein 41 days ago

Assigning to multiple variables in a single expression is fine and useful. Take

``` target[i++] = source1[j++] + source2[k++]; ``` That's idiomatic, it shows the intent to read and consume the value in a single expression. You can write it longer, but not more clearly.

It's only when you assign to the same variable multiple times, or read it after it was assigned, that it introduces ordering issues.

A single `i++` or `++i`/`i += 1` is safe and useful.

susam 42 days ago

I searched K&R to see if there is any language that implies a += a++ + a++ to be undefined. I couldn't find anything. I found the following excerpt which is closest to what I claim, in spirit. But still, it does not explicitly spell out that an object must not be modified more than once between sequence points. From § A.7 Expressions:

> The precedence and associativity of operators is fully specified, but the order of evaluation of expressions is, with certain exceptions, undefined, even if the subexpressions involve side effects. That is, unless the definition of the operator guarantees that its operands are evaluated in a particular order, the implementation is free to evaluate operands in any order, or even to interleave their evaluation. However, each operator combines the values produced by its operands in a way compatible with the parsing of the expression in which it appears. This rule revokes the previous freedom to reorder expressions with operators that are mathematically commutative and associative, but can fail to be computationally associative. The change affects only floating-point computations near the limits of their accuracy, and situations where overflow is possible.

So I think, the text in K&R serves as warning against writing such code, at best. The C99 draft has more relevant language. From § 4. Conformance:

> If a "shall" or "shall not" requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined".

This along with the § 6.5 excerpt already mentioned in my post implies a += a++ + a++ to be undefined. When I get some more time later, I'll make an update to my post to include the § 4. Conformance language too for completeness.

Thank you for the nice comment!

tengwar2 42 days ago

Not nasal demons in this case (https://groups.google.com/g/comp.std.c/c/ycpVKxTZkgw/m/S2hHd...): thaumasiotes shows that we can expect a numeric answer.

_kst_ 42 days ago

I don't see the name "thaumasiotes" at that link, nor do I see anything relevant to the code in the title.

The behavior of "int a = 5; a = a++ + ++a;" is undefined. There is no guarantee of a numeric result, because there is no guarantee of anything.

susam 42 days ago

I believe they were referring to thaumasiotes's thread here: https://news.ycombinator.com/item?id=48141294

I think the objection thaumasiotes has raised there is valid and I have made an attempt to answer it as well in the same thread.

HarHarVeryFunny 41 days ago

It's only the order of evaluation that is undefined.

_kst_ 41 days ago

No, the behavior is undefined. That means, quoting the ISO C standard, "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements".

A conforming implementation could reject it at compile time, or generate code that traps, or generate code that set a to 137, or, in principle, generate code that reformats your hard drive. Some of these behaviors are unlikely, but none are forbidden by the language standard.

HarHarVeryFunny 41 days ago

I was wrong.

I was looking at this:

https://en.cppreference.com/cpp/language/eval_order

I'm not sure where precisely this sequencing exception to the default "eval order undefined" rule is given, but after the 24(!) sequencing rules they do give this "++i + i++" as an explicit example of undefined behavior.

Interestingly that page says that since C++17 f(++i, ++i) is "unspecified" rather than "undefined", whatever that means, and presumably plus(++i, i++) would be too, which seems a bit inconsistent.

afdbcreid 41 days ago

Nope, there is no sequence point in the middle and modifying an object more than once between sequence points is undefined behavior.

amelius 42 days ago

It doesn't matter if the answer is wrong. You run the test program and then replace the code by the answer. This basically weeds out the UB.

yongjik 42 days ago

But since it is a UB, there's no guarantee that your test program produces the same result as the same code running on production, even if you have the same compiler.

amelius 42 days ago

That's very unlikely, and in the worst case you've reduced a difficult bug into an easier to understand bug.

1718627440 41 days ago

That's a valid approach, if you only use high-level language to generate assembly faster, and the assembly is your source of truth.