Hacker News new | ask | show | jobs
by susam 43 days ago
The code in the post seems very similar to the one in my own post from 2010: https://susam.net/sequence-points.html

  int a = 5;
  a += a++ + a++;
I do remember that this particular code snippet (with a = 5, even) used to be popular as an interview question. I found such questions quite annoying because most interviewers who posed them seemed to believe that whatever output they saw with their compiler version was the correct answer. If you tried explaining that the code has undefined behaviour, the reactions generally ranged from mild disagreement to serious confusion. Most of them neither cared about nor understood 'undefined behaviour' or 'sequence points'.

I remember one particular interviewer who, after I explained that this was undefined behaviour and why, listened patiently to me and then explained to me that the correct answer was 17, because the two post-increments leave the variable as 6, so adding 6 twice to the original 5 gives 17.

I am very glad these types of interview questions have become less prevalent these days. They have, right? Right?

12 comments

IMO, The only reasonable answer if asked this in an interview is “I would not write code where I have to know the answer to this question”

These sorts of things are neat trivia to learn about things like sequence points but 99.9% of the time if it matters in your codebase you're writing something unmaintainable.

> IMO, The only reasonable answer if asked this in an interview is “I would not write code where I have to know the answer to this question”

That's half of a reasonable answer. The other half is "but I do know the answer so if I see it when reviewing or working on someone else's code I can flag it or rewrite it, and explain to them why it is bad".

  > The other half is "but I do know the answer
Except you don't!

If you claim to know the answer you've made a grave mistake and fooled yourself.

If you ran the code in a compiler and used that to conclude "this is the answer" rather than "this is an answer" then now is a great time to learn how easy it is to fool yourself. You just need you ask yourself what assumptions you made. I'll wager you assumed all compilers process this line in the same way.

Or just RTFA, or Susam's, as that's exactly what they are about. They explain why this is undefined behavior.

  | The first principle is that you must not fool yourself — and you are the easiest person to fool.
  - Feynman
> I'll wager you assumed all compilers process this line in the same way

You would lose that wager.

What I mean by "I do know the answer" is that I know that this is undefined behavior and why it is undefined behavior and that different compilers can give different results and also that even if I test the compiler I use to see what it does I can't count on that not changing any time the compiler gets updated.

Fair, but that was not clear to me that that's what you meant. It's clear that others interpreted it the way you intended too (since there are multiple replies saying exactly what you said) but I also don't think I'm the only one who misinterpreted. Sorry that I did.
Seems fitting in a thread about undefined behavior
I think you're misinterpreting "I know the answer". The GP is suggesting rewriting it, so the know the issue.
> Except you don't!

Except you can do, because "The answer is that this isn't a valid C program." is a sentence you can know.

Syntactically, it is valid. Though yes, semantically it is invalid. Calling it "valid" is going to just continue the same problem because there's multiple ways to interpret valid here
No it isn't. You don't need to know the answer to know that it is bad code. The very fact that it isn't clear shows that.
Right, the feedback I'd expect in a code review interview is something like "This is unclear or wrong, write what you actually meant".

That's the feedback I would want, and it's the feedback I give to my colleagues in reviews. Actually I tend to be too verbose, so you might get a full paragraph explaining what the ISO document says and that you shouldn't assume it does whatever it is your compiler says.

My actual feelings for this specific case are that the language is defective, but if we're wedded to a defective language then the reviews need to call out such usage.

  > Actually I tend to be too verbose, so you might get a full paragraph explaining what the ISO document
I'm verbose too, but I love it when others are. Honestly, it's usually easy to triage (and I write to try to make it easy). I like verbosity because learning why means I not only won't make that mistake again but I won't make any similar mistakes again.

Verbosity isn't bad. Not everything needs to be a fucking tweet

If you know is this code is bad, but don't know that it is UB, I thing you are rating code on feelings and cargo culting.
I mean the good answer is:

I am not sure this code could be interpreted the same by different programmers and compilers alike. So I would never write it.

That isn't a strong answer, unless the question was about what you think of the code's style. The point is that it's more than just poor style, it's likely to constitute undefined behaviour. Saying different compilers could interpret the expression differently is not equivalent to saying it invokes undefined behaviour.

It's common for interview questions to explore unusual code, perhaps poor quality code, that hinges on edge cases of the language. A candidate with a strong understanding of the language would be expected to take the opportunity to demonstrate it.

You might still make a mistake, even if you think you know the answer. It's much better to instrument the code to figure it out, or write a short test program.
It's Undefined Behavior. So you can instrument all you want, the answer will still be wrong. You'll capture what your particular compiler does under some particular conditions (opt flags, surrounding code, etc.) but that will not be representative of what can happen in the general case (hint : anything can happen with UB).
> It's Undefined Behavior.

Susam's post doesn't make this clear. The quotes from K&R say that the modifications to the variable may take place in any order, but they don't directly say that doing this is Undefined Behavior, which would make it permissible to do anything, including e.g. interpreting the increments as decrements.

The C99 standard is quoted saying this:

>> Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.

It's possible that something else in the standard defines noncompliance with this clause as Undefined Behavior. But that's not the most intuitive interpretation; what this seems to say, to me, is that the line of code `a = a++ + ++a` should fail to compile, because it's not in compliance with a requirement of the language. Compilers that produce any result at all are suffering from a bug.

(It seems more likely that the actual intent is to specify that, given the line of code `b = a++ + ++a`, with a initially equal to 5, the compiler is required to ensure that the value stored at the address of a is never equal to 6 - that it begins at 5, and at some indefinite point it becomes 7, but that there is no intermediate stage between them. But I find the 'compiler failure on attempt to put multiple modifications between two sequence points' interpretation preferable.)

The "shall" in the standard means it's undefined behavior. This is explained in the "Conformance" section,

> 2. If a ‘‘shall’’ or ‘‘shall not’’requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.

Compilers will not refuse to compile the code, indeed the blog post we are all commenting on reports the results from a bunch of different compilers. Historically the reason the C standard specified a lot of undefined behavior is that the actually existing C compilers at the time compiled the code but disagreed about the output.

Compilers are not able to prevent you from violating must/shall in the general case. So they're not held to that bar. Unless the standard says not to compile it, it's not a compiler bug.

Also, imagine a situation where the line of code actually lists three different variables, but all three of them are passed in by address. It quickly becomes impossible for the compiler to know you violated the spec by reusing the same variable. And even optimizations that make sense here could corrupt the value pretty badly and possibly lead to worse errors.

I searched K&R to see if there is any language that implies a += a++ + a++ to be undefined. I couldn't find anything. I found the following excerpt which is closest to what I claim, in spirit. But still, it does not explicitly spell out that an object must not be modified more than once between sequence points. From § A.7 Expressions:

> The precedence and associativity of operators is fully specified, but the order of evaluation of expressions is, with certain exceptions, undefined, even if the subexpressions involve side effects. That is, unless the definition of the operator guarantees that its operands are evaluated in a particular order, the implementation is free to evaluate operands in any order, or even to interleave their evaluation. However, each operator combines the values produced by its operands in a way compatible with the parsing of the expression in which it appears. This rule revokes the previous freedom to reorder expressions with operators that are mathematically commutative and associative, but can fail to be computationally associative. The change affects only floating-point computations near the limits of their accuracy, and situations where overflow is possible.

So I think, the text in K&R serves as warning against writing such code, at best. The C99 draft has more relevant language. From § 4. Conformance:

> If a "shall" or "shall not" requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined".

This along with the § 6.5 excerpt already mentioned in my post implies a += a++ + a++ to be undefined. When I get some more time later, I'll make an update to my post to include the § 4. Conformance language too for completeness.

Thank you for the nice comment!

Not nasal demons in this case (https://groups.google.com/g/comp.std.c/c/ycpVKxTZkgw/m/S2hHd...): thaumasiotes shows that we can expect a numeric answer.
I don't see the name "thaumasiotes" at that link, nor do I see anything relevant to the code in the title.

The behavior of "int a = 5; a = a++ + ++a;" is undefined. There is no guarantee of a numeric result, because there is no guarantee of anything.

It doesn't matter if the answer is wrong. You run the test program and then replace the code by the answer. This basically weeds out the UB.
But since it is a UB, there's no guarantee that your test program produces the same result as the same code running on production, even if you have the same compiler.
That's a valid approach, if you only use high-level language to generate assembly faster, and the assembly is your source of truth.
On one hand I've been using almost the exact statement 25 years ago in my Flash (ecmascript) tutorials to narrow down the point of operator precedence.

I still believe it's a good piece on your powerpoint if you want to teach. It's easy to fall, easy to grasp, and easy to unroll all the rules - that is, if the rules are actually set in stone.

On the other hand I've been through couple FAANG interviews, and twice I was presented with something similar and after I glanced at it for a half a minute the interviewer quickly proceed to "a ha!, you don't know! the interview is over , but I'm happy to tell you the right answer".

That part is not cool.

The answers to some questions must be known in order to be able to write a correct program.

In the vast majority of the programming languages, the order of evaluation for the actual parameters passed to a function is undefined. In the few programming languages where the order of evaluation is defined, that is actually a mistake in the design of that programming language.

This is something about which any programmer must be well aware, because when composing function invocations it is very easy to write a function invocation where the result would depend on the order of evaluation of the expressions passed as actual parameters. The arithmetic operators are also function invocations, so that applies to them too.

> when composing function invocations it is very easy to write a function invocation where the result would depend on the order of evaluation of the expressions passed as actual parameters

this simply means your functions aren't pure functions, and is doing side effects. If you rewrite those functions to not have side effects (including ones being used to generate the parameters), there would be zero issues of such nature.

In some sense, and without the interviewer knowing, that is actually a great scenario for an interview.

If you can convince someone in a position of authority that they’re wrong about something technical without upsetting them then you’re probably a good culture fit and someone who can raise the average effectiveness of your team.

Or, also, in the reverse direction, if the interviewer is wrong about it and can't be convinced otherwise, it's probably not a great place to work.
I know I did recommend someone after the interview because I looked it up and they were right. Great person to work with. Though I fully understand why most would hesitate.
The best interview questions spawn discussions. This is a pretty good one for that. We could dive into what makes it UB, why a particular compiler might do it a certain way, what results we'd likely see from other compilers, and why the standard might say that this sort of thing is UB.

"What does this produce?" and expecting an answer of "17" is a bad question even if UB didn't mean the expected answer is wrong.

I don’t work a ton with C, but I wonder how C programmers keep track of what behavior is and is not defined. It seems like there are many possible edge cases.
We get by on a combination of matching patterns (any pointer cast gets a lot of scrutiny, for example), compiler warnings, tools like UBSan, debugging when things go wrong, and sheer dumb luck.

Having an understanding of how the code gets transformed into machine code helps. For this case, there's the basic idea that `a++` will boil down to three basic conceptual operations: fetch, add, and store, and those can be potentially interleaved with other parts of the statement. In something like `a++ + ++b` the interleaving doesn't affect the outcome no matter how it's done. In `a++ + ++b` the interleaving can affect the outcome, and that's your sign that something might be wrong.

Any memory safety issue in C code had to involve UB at some point. And you can see how prevalent those are, and deduce how not-particularly-great we are at keeping track of UB.

> Having an understanding of how the code gets transformed into machine code helps

I'm not sure about that. Knowing assembly is not a substitute for knowing how the language is defined. Sometimes C/C++ programmers with some assembly knowledge reason themselves into thinking that what they're asking of the language must have well-defined behaviour, when in fact it's undefined behaviour. It doesn't really matter whether interleaving order can change the output. (++i)++ is, apparently [0], undefined behaviour in C but has well defined behaviour in C++.

[0] https://stackoverflow.com/a/58841107

I don't mean assembly in this case, but something more like the compiler's view of the code. a++ can be broken down into more primitive operations, and might actually be, depending on how the compiler is implemented. The fact that the ordering of those more primitive operations with respect to other operations isn't very tightly constrained is something you'd just have to know about the language, I suppose.
They don't. In the culture some kinds of undefined behaviour are taken seriously and some aren't. If you want to write code that "works", you emulate what popular performance benchmarks do (whether their code is undefined according to the standard or not), since those are the thing that C compiler developers actually care about.
Personally, when ever I write a modifying statement, I wonder about the domains of the input and ensure, that the condition necessary to stay in the existing range is evaluated. If it is not, I either write the condition, reduce the input domain, or increase the output domain.
They don't really. In fact there are many things that are technically UB but are so common that compilers can't really treat them as UB. E.g. type punning via unions.
Type punning via unions is not UB in C in general, but it is in C++ IIRC.

I write "in general" because, as with other forms of memory reinterpretation (memcpy or copy through a character type), evaluating a trap representation triggers UB.

The short version is that it's fine in C++ as long as you only read the member that was last written to or a char type.
Yeah, undefined behavior just means not defined in the specification.

I would argue that most languages only have one compiler so it doesn't matter what is in the specification.

> I am very glad these types of interview questions have become less prevalent these days. They have, right? Right?

Are you referring to the type of interview questions where the question is ill-defined and no one should know the answer, or the type where the question is reasonable and well-defined, but the interviewer doesn't know the answer?

I had a phone screen with Google once where they asked how to determine the length of a stretch of contiguous 1s within an infinite array of 0s. I suggested that, given the starting index i, you can check the index i+2 and then repeatedly square it until you find yourself among the zeroes, after which you can do binary search to find the transition from ones to zeroes.

The interviewer objected that this will grow the candidate end index too quickly, and the correct thing to do is to check index i+1 and then successively double it until you find the zeroes. We moved on.

I passed that phone screen. But I still resent it, because I checked the math later and "successive squaring followed by binary search" and "successive doubling followed by binary search" take exactly the same amount of time.

I meant the latter. I think the question is fine. It can lead to a good discussion, similar to what we are having in this thread. It has been a long time (almost 20 years), but I remember that most interviewers who asked this seemed to be convinced that the output they had seen with their compiler version was the correct answer. What could be a nice and relevant discussion, especially considering that some classes of bugs and security issues result from it, was seen only as a trivia quiz by the interviewers, with the expectation of an answer that was incorrect, no less.

Your phone screen story is quite nice. When I read your question, I would have answered with successive doubling as well. In fact, I faced the same question at an AWS interview a long time ago. The question was mathematically the same question but formulated differently. I answered with the doubling solution too, which leads to an O(log n) time solution, asymptotically. Your interviewer's immediate objection to your squaring solution seems like a major failure in their intuition. When I read your solution, purely by intuition, that is, without resorting to any rigorous reasoning, I felt: wow, that's interesting, your solution would land on the zero region in merely O(log log n) time. Why didn't I think of it? I think your solution should spark interest rather than dismissal in a curious person. Of course, the binary search after that to find the exact transition point blows up the time consumed back to O(log n).

Once again, thanks for these really interesting comments!

From first principles, it seems unlikely that interviewers selecting their own questions would be able to eliminate this class of question, since by definition they cannot know whether the answer they believe is correct really is correct or not.

I would be 100% behind a movement to replace interviewer freedom with externally-set, vetted questions.

Do you want a job at a place where someone who doesn't understand UB makes the hiring decisions?
Sometimes, even in tech, you just need a job.
I think your options are very limited if you look for places that have people that truly understand UB, even less so the hiring people.
In the land of the blind, the one eyed man is King.
And I thought he was just a senator.
Heh, one time when I got this style of question[1] (but for JavaScript), I took a glance at it and said "Um ... you really shouldn't write code like that." The interviewer replied, "Oh. Yeah. Fair point." And then went on to another question.

[1] By which I mean predicting the behavior of error-prone code that requires good knowledge of all the quirks of the language to correctly answer.

Genuinely curious, so this is undefined behavior and depends on the compiler. I get that. Java, and other languages, can do these same operations but their compilers produce bytecode that runs on a virtual machine (JVM) compiled to machine code just-in-time. Would this same code in Java possibly yield different results based on the platform the JVM was running on because of the platform specific JIT compiler? Maybe that's part of the origin of the phrase "write once, test everywhere".
The UB comes from how C++ standard defines expression sequencing which is not relevant for Java. Languages other than C++ typically define such details more strictly so there is no UB or even concept of UB. JIT compilers don't change it as any non toy JIT will generate native instructions directly or through intermediate representation (instead of generating C++ text and passing that through regular C++ compiler) both of which should have much stricter semantics compared to what C++ guarantees.
late to this but what i meant was the JIT Compiler had to be compiled by a compiler at some point so its behavior is defined by the compiler used to compile it (this goes all the way down to the poor soul who hand compiled the first compiler). I'm assuming things like JIT are platform dependent and therefore the compiler used to build it would also be platform dependent. So the JIT on different platforms could have been built from different compilers choosing their own way of handling undefined behavior. ...unless i'm confusing myself which is likely.
The compiler is compiled via java nowadays. It's bootstrapped.

Yes the first one was probably written in C, however long ago. Not super relevant now though.

JVM/JRE, probably a different story

> Would this same code in Java possibly yield different results based on the platform the JVM was running on because of the platform specific JIT compiler?

No, and it's also well defined in languages like C#.

If we're talking about this specific example at least. No sequence point issues like that in Java.

I'd be badly surprised if the jvm jit went through C, so if this monstrosity is well defined in Java it's well defined once well defined everywhere.

but still, if it were, it was and remained, as gp points out, bad practice...

It's been quite a while, but IIRC, in Java these statements actually do have a defined behavior.

The ++x is a "pre-increment", meaning the value of the variable is incremented prior to evaluating the expression, while the "post-increment" "x++" is the other way around: the expression evaluates to x, then x is incremented afterwards.

All expressions are left-to-right.

That behavior is inherited from C. The pre/post increment behavior is actually the same in every language that uses them. The priority of operation is also usually the same as well.

The reason the question is tricky is because those operators change the value of a as the full expression is progressively executed.

It's not immediately clear to me what the answer in Java would be.

Just take a++ + ++a for example:

If the value if `a` is hoisted by the jvm then it could be 5++ + ++5, so 5 + 6.

But if it's executed left to right and `a` is looked up every time, then it becomes 5++ + ++6, so 5 + 7.

The value of the variable is not hoisted by the Java compiler. (It's not that JVM, that only executes the byte code, what y doesn't have that kind of ambiguities.)

The semantics of Java is not undefined on multiple assignments to the same variable in an expression, so it can't hoist something if it would change the outcome.

Now, I don't actually know what the outcome is, because I don't remember whether `a += e` reads the value of `a` before or after evaluating `e`. The code is still confusing and unreadable to humans, so you shouldn't write it, but the compiler behavior is not undefined.

And if your variable is accessed from multiple threads, it may be undefined which intermediate values night be seen.

    $ cat a.java 
    class a {
        public static void main (String[] args) {
            int a = 4;
            int b = a++ + ++a;
            System.out.println(b);
            System.out.println(a);
        }
    }

    $ javac a.java 
    $ java a
    10
    6
> I found such questions quite annoying because most interviewers who posed them seemed to believe that whatever output they saw with their compiler version was the correct answer.

Other than the job for most programmers having nothing to do with whether they know the outcome, because hopefully they'd never write something like it or clean it up. And IF they found it they'd hopefully test it - given that it appears to be compiler dependent anyways.

Both major compilers yell at you for this nowadays... it's pretty unforgivable IMHO for somebody to be asking it as an exam or interview question if the right answer isn't "undefined":

    <source>:5:10: warning: multiple unsequenced modifications to 'a' [-Wunsequenced]
        5 |     a = a++ + ++a;
          |         


    <source>:5:7: warning: operation on 'a' may be undefined [-Wsequence-point]
        5 |     a = a++ + ++a;
          |     ~~^~~~~~~~~~~
The interviewer asking stuff like that is a good sign to leave immediately.
Maybe the interviewer seeks to hear something like "This is UD, this code needs to be rewritten, should not pass code review. What prevents you from using -Wall when compiling?"
If they're wasting time during an interview on nonsense like that, nah.
>I am very glad these types of interview questions have become less prevalent these days. They have, right? Right?

I just refuse to do interviews like that any more.

How many tennis balls can fit in a bus?
Under what pressure?
Nice I must remember this for the next interview I'm going to attend
Assuming the tennis balls have been digest and shit out by wombat. We thusly have an efficient Cartesian packing ...
obviously under the maximum allowable pressure that each surface of the bus can withstand.
Well... tried it on macOS using vanilla gcc, the results surprised me:

  $ /bin/cat x.c; gcc -w -o x x.c; ./x
  #include <stdio.h>
  
  int main()
  {
      int a = 5;
      a += a++ + a++;
      printf("a = %d\n", a);
  }
  a = 18
Not what I expected. This must be how it works:

- The first a++ expression results in 5, after a = 6 - The second a++ expression results in 6, after a = 7 - Only then the LHS a is evaluated for the addition-assignment, so we get: a = 7 + 5 + 6 = 18

the original question has a=, you have a+=
They're using the version from the top comment, not the post. It also switches the pre-increment to post-increment.
I was just replying to the comment ¯\_(ツ)_/¯