Hacker News new | ask | show | jobs
by MaxBarraclough 31 days ago
> Having an understanding of how the code gets transformed into machine code helps

I'm not sure about that. Knowing assembly is not a substitute for knowing how the language is defined. Sometimes C/C++ programmers with some assembly knowledge reason themselves into thinking that what they're asking of the language must have well-defined behaviour, when in fact it's undefined behaviour. It doesn't really matter whether interleaving order can change the output. (++i)++ is, apparently [0], undefined behaviour in C but has well defined behaviour in C++.

[0] https://stackoverflow.com/a/58841107

1 comments

I don't mean assembly in this case, but something more like the compiler's view of the code. a++ can be broken down into more primitive operations, and might actually be, depending on how the compiler is implemented. The fact that the ordering of those more primitive operations with respect to other operations isn't very tightly constrained is something you'd just have to know about the language, I suppose.
> The fact that the ordering of those more primitive operations with respect to other operations isn't very tightly constrained is something you'd just have to know about the language, I suppose.

No, that's not right. It's undefined behaviour, not merely an unspecified order of evaluation. Roughly speaking, the behaviour of the entire program is unconstrained by the language standard after execution of that statement. It could crash the whole process, for instance, or go haywire.

(Again, that's in C, apparently, but not in C++.)

It's worse than that, the behavior of the entire program is unconstrained by the language standard beforehand too. Raymond Chen discusses how things can go wrong once you're going to reach UB even before you get to it: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

Anyway, I didn't mean to imply that things behaved as written aside from ordering issues. I only meant that this sort of principle can help you remember where UB lurks. Generally, where a kind C compiler might just mess with your numbers a bit, an evil C compiler can legally make demons fly out of your nose.

> It's worse than that, the behavior of the entire program is unconstrained by the language standard beforehand too. Raymond Chen discusses how things can go wrong once you're going to reach UB even before you get to it

Heh, yes that's exactly what I was thinking when I put roughly speaking.

> where a kind C compiler might just mess with your numbers a bit, an evil C compiler can legally make demons fly out of your nose

Yes, signed integer overflow being another. Presumably it's defined that way as it's simpler than trying to spell out all the behaviours the compiler is permitted to implement, and on top of that there are trap representations to worry about. I doubt modern compilers get much optimization benefit from it though. There's a StackOverflow thread discussing the reasons it's defined this way: https://stackoverflow.com/q/1860461

Apparently signed integer overflow UB helps with loop optimizations because it makes it easy to prove the loop always terminates. I assume that's not why it's UB, though; surely it's UB because some systems trapped on overflow, or produced different results due to using 1's complement, and the optimization side of the rule was a happy accident. There's a lot of history in this language and it really shows.
> Apparently signed integer overflow UB helps with loop optimizations because it makes it easy to prove the loop always terminates

I tried googling but couldn't find hard numbers on the performance impact of GCC's -fwrapv flag. As you'd imagine, it forces wrapping for overflowing arithmetic on signed integer types. [0]

I also have to wonder how instructive that would be anyway. The GCC devs presumably don't prioritise the performance impact of that flag. If the C language mandated it, they might find other ways to achieve similar optimisation.

This page [1] looks at the related flag for trap-on-signed-overflow, and found an impact of very roughly 6%.

See also: John Regehr's posts on this topic. [2][3] He dislikes Java's implicit wrap behaviour, as it's rarely what the programmer wants to happen. Java programmers almost never use the addExact method, [4] as it's so syntactically clumsy.

> surely it's UB because some systems trapped on overflow, or produced different results due to using 1's complement, and the optimization side of the rule was a happy accident

Per this StackOverflow answer [5] I think you have it right.

[0] https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html

[1] https://danluu.com/integer-overflow/

[2] https://blog.regehr.org/archives/1401

[3] https://blog.regehr.org/archives/1154

[4] https://docs.oracle.com/en/java/javase/25/docs/api/java.base...

[5] https://stackoverflow.com/a/18195756