> The above example will print the value of a, but it won’t be initialized to 123!
It certainly could do though. In C, using an uninitialised variable does not mean "whatever that memory happened to have in it before" (although that is a potential result). Instead, it's undefined behaviour, so the compiler can do what it likes.
For example, it could well unconditionally initialise that memory to 123. Alternatively, it could notice that the whole snippet has undefined behaviour so simply replace it with no instructions, so it doesn't print anything at all. It could even optimise away the return that presumably follows that code in a function, so it ends up crashing or doing something random. It could even optimise away the instructions before that snippet, if it can prove that they would only be executed if followed by undefined behaviour – essentially the undefined behaviour can travel back in time!
UB can not travel back in time in C. Although it is true that it can affect previous instructions, but that code is reordered or transformed in complicated ways is true even without UB.
Yes, random blog posts did a lot of damage here. Also broken compilers [1]. Note that blog post is correct about C++ but incorrectly assumes this is true for C as well.
I'm inclined to trust Raymond Chen and John Regehr on these matters, so if you assert that they're incorrect here then a source to back up your assertion would help your argument.
I am a member of WG14. You should check the C standard. I do not see how "time-travel" is a possible reading of the definition of UB in C. We added another footnote to C23 to counter this idea:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
"Any other behavior during execution of a program is only affected as a direct consequence of the concrete behavior that occurs when encountering the erroneous or non portable program construct or data. In particular, all observable behavior (5.1.2.4) appears as specified in this document when it happens before an operation with undefined behavior in the execution of the program."
I should point out that compilers also generally do not do true time-travel: Consider this example: https://godbolt.org/z/rPG14rrbj
The issue you linked to is not a counter example because, as the poster said, g may terminate the program in which case that snippet does not have undefined behaviour even if b is zero. The fact that they bothered to mention that g may terminate the program seems like an acknowledgement that it would be valid to do that time travelling if it didn't.
> Note that blog post is correct about C++ but incorrectly assumes this is true for C as well.
Presumably you're referring to this line of the C++ standard, which does not appear in the C standard:
> However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
I looked at every instance of the word "undefined" in the C standard and, granted, it definitely didn't have anything quite so clear about time travel as that. But it also didn't make any counter claims that operations before are valid. It pretty much just said that undefined behaviour causes behaviour that is undefined! So, without strong evidence, it seem presumptuous to assume that operations provably before undefined behaviour are well defined.
The poster is me. You are right that this is not an example for time-travel. There aren't really good examples for true time travel because compilers generally do not do this. But my point is that with compilers behaving like this, people might confuse this for time-traveling UB. I have certainly met some who did and the blog posts seems to have similar examples (but I haven't looked closely now).
Yes, and with undefined behavior, the compiler has to emit code that has the behavior defined by the code up to the operation that has undefined behavior.
That is false. If a compiler determines that some statement has undefined behavior, it can treat it as unreachable, and, transitively, other code before it as unreachable.
printf("hello\n"); // this doesn't have to print
x = x / 0; // because this is effectively a notreached() assertion
This is in direct contradiction to what uecker says. Can you back up your claim -- for both C and C++? Putting your code in godbolt with -O3 did not remove the print statement for me in either C or C++. But I didn't experiment with different compilers or compiler flags, or more complicated program constructions.
I've often said that I've never noticed any surprising consequences from UB personally. I know I'm on thin ice here and running risk of looking very ignorant. There are a lot of blogposts and comments that spread what seems like FUD from my tiny personal lookout. It just seems hard to come across measureable evidence of actual miscompilations happening in the wild that show crazy unpredictable behaviour -- I would really like to have some of it to even be able to start tallying the practical impact.
And disregarding whatever formulations there are in the standard -- I think we can all agree that insofar compilers don't already do this, they should be fixed to reject programs with an error message should they be able to prove UB statically -- instead of silently producing something else or acting like the code wouldn't exist.
Is there an error in my logic -- is there a reason why this shouldn't be practically possible for compilers to do, just based on how UB is defined? With all the flaws that C has, UB seems like a relatively minor one to me in practice.
This is an adaption from the Raymond Chen post, and it seems to actually compile to a "return 1" when compiling with C++ (not with C), at least with the settings I tried. And even the "return 1" for me is understandable given that we actually hit a bug and there are no observeable side-effects before the UB happens. (But again, the compiler should instead be so friendly and emit a diagnostic about what it's doing here, or better return an error).
Un-comment the printf statement and you'll see that the code totally changes. The printf actually happens now. So again, what uecker says about observable effects seems to apply.
If a compiler can determine that some statement is UB, it can treat that as an assertion that the code is unreachable. All other statements which reach only that code and no other are also unreachable.
A compiler's analysis can go backward in time. That is to say, the compiler can build a model of what happens in some section of code over time, and analyze it whichever way it wants.
You cannot go back in time from execution time to translation time, but the translator can follow the code as if it were executing it at translation time.
In C, it is only undefined behavior to access an automatic object that has not been initialized.
Static objects are always initialized, so the situation cannot arise.
That leaves dynamic ones, like uninitialized struct members in a malloced structure.
Accessing uninitialized dynamic memory means isn't undefined behavior in C. It results in whatever value is implied by the uninitialized bits. If the type in question has no trap representations, then it cannot fail.
That strikes me as better. The original macro presumably misbehaves if there's more than one statement in a sequence, as the if will only affect the first statement.
Both version of the macro makes this fall through from 0:
switch (a) {
brkcase 0: foo();
case 1: bar();
}
so in a sense the `if (0) case` trick also affects the previous case, not the current one. But that one also falls apart when there are multiple statements under the brkcase.
Nitpick: you could replace sys.stdout.write(f"{n}\n") with print(n). The current code looks very much like it was written for Python 2 (apart from the f string!), where print was a statement. As of Python 3, print is just a regular function. It returns None, which is falsey, so you'd also need to change your first "and" to an "or".
malloc() can't be implemented in C either because it's defined as doing things (creating new memory objects) there are no lower level mechanisms in C to do.
Thanks to array decay to pointer, we basically have `*(array_label+offset)` which in this case of yours we have `*(offset+array_label)`; in other words, `*(arr+4)` is the same as `*(4+arr)`...that's it, really!
Duff is relying on the fact you're allowed to intermingle the switch block and the loop in K&R C's syntax, the (common at the time but now generally frowned on or even prohibited in new languages) choice to drop-through cases if you don't explicitly break, and the related fact that C lets your loop jump back inside the switch.
Duff is trying to optimise MMIO, you wouldn't do anything close to this today even in C, not least because your MMIO is no longer similarly fast to your CPU instruction pace and for non-trivial amounts of data you have DMA (which Duff's hardware did not). In a modern language you also wouldn't treat "MMIO" as just pointer indirection, to make this stay working in C they have kept adding hacks to the type system rather than say OK, apparently this is an intrinsic, we should bake it into the freestanding mode of the stdlib.
Edited to add:
For my money the successor to Tom Duff's "Device" is WUFFS' "iterate loops" mechanism where you may specify how to partially unroll N steps of the loop, promising that this has equivalent results to running the main loop body N times but potentially faster. This makes it really easy for vectorisation to see what you're trying to do, while still handling those annoying corner cases where M % N != 0 correctly because that's the job of the tool, not the human.
The overarching point appears to be getting rid of angle brackets, which is not something that Duff is doing. Further, Duff's device keeps case labels on the left of its control structure; moving ifs to the left is the other "innovation" here.
I think you really have to squint your eyes to see the similarities, beyond the general theme of exploiting the counterintuitive properties of switch statements.
While it's convenient technically to have unified memory and so it makes a lot of sense for your machine code, in fact the MMIO isn't just memory, and so to make this work anyway in the C abstract machine they invented the "volatile" qualifier. (I assume you weren't involved back then?)
This should be a suite of intrinsics. It's the same mistake as "register" storage, a layer violation, the actual mechanics bleeding through into the abstract machine and making an unholy mess.
If you had intrinsics it's obvious where the platform specific behaviour lives. Can we "just" do unaligned 32-bit stores to MMIO? Can we "just" write one bit of a hardware register? It depends on your platform and so as an intrinsic it's obvious how to reflect this, whereas for a type qualifier we have no idea what the compiler did and the ISO document of course has to be vague to be inclusive of everybody.
I wasn't involved back then, but I know the history. I thought you were talking about something more recent.
But this is all opinions and terms such as "unholy mess" etc do not impress me. In my opinion "volatile" is just fine as is "register. Neither are layer violations nor a type system problem. That the exact semantics of a volatile access are implementation defined seem natural. How is this better with an intrinsic? What I would call a mess are the atomics intrinsics, which - despite being intrinsics - are entirely unsafe and dangerous and indeed mess (just saw a couple of new bugs in our bug tracker).
Certainly not. That's the purpose of the article where they say in the final sentence that it's entirely possible to write readable, yet totally befuddling code in C that stands a chance in the IOCCC.
If only there was a way of using setjmp/longjmp-style contexts instead of goto, un/winding the stack as required. So we could travel around in time... unfortunately you can't work with a setjmp buffer before it's actually created, unlike gotos.
My undergrad was entirely in the C language and I’m very glad for it. Sometimes more modern languages can throw me for a loop, no pun intended, but the beauty (and horror) of C is that you are pretty close to the metal, it’s not very abstracted at all, and it allows you a lot of freedom (which is why it’s so foot gunny).
I will never love anything as much as I love C, but C development jobs lie in really weird fields I’m not interested in, and I’m fairly certain I am not talented enough. I have seen C wizardry up close that I know I simply cannot do. However, one of the more useful exercises I ever did was implement basic things like a file system, command line utilities like ls/mkdir etc. Sometimes they are surprisingly complex, sometimes no.
After you program in C for a while certain conventions meant to be extra careful kind of bubble up in languages in a way that seems weird to other people. for example I knew a guy that’d auto reject C PR’s if they didn’t use the syntax if (1==x) rather than if (x==1). The former will not compile if you accidentally use variable assignment instead of equality operator (which everyone has done at some point).
This tendency bites me a lot in some programming cultures, people (ime) tend to find this style of programming as overly defensive.
> if they didn’t use the syntax if (1==x) rather than if (x==1). The former will not compile if you accidentally use variable assignment instead of equality operator
No need for Yoda notation. clang will warn of this by default and gcc will do so if you compile with -Wall, which should also be your default.
> for example I knew a guy that’d auto reject C PR’s if they didn’t use the syntax if (1==x) rather than if (x==1). The former will not compile if you accidentally use variable assignment instead of equality operator
I've seen that one and personally dislike that mindset: Making the code less readable to compensate for a disinterest in using actual static analysis tooling.
I force my students to do C development. And it turns out that it is not that hard if you approach it with modern tools which catch a lot of problems. The lack of abstraction is fixed with good libraries.
C evolved a lot and many foot guns are not a problem anymore. For example for
The warning is very clear. If you did intend to use the result of an assignment as truth value, you would notice. In any case, did not have a single problem with this type of error in the last decades, working with programmers of various skill levels including beginners.
I absolutely I agree that learning to create you own abstractions is an incredible useful skill. It depends though. For a programming course this makes absolutely sense. But for applied problems in, say, biomedical engineering, this does not work. Many students know only a bit of Python, and then it is too much and "too inconvenient" to start from scratch in C. With Python they have a lot of things more easily available, so they make quick progress. This does not lead to good results though! For most of the Python projects, we end of throwing away the code later. Another problem is that students often do not know what they are doing, e.g. the use some statistical package or visualization package and get nicely looking results, but they do not know what it means and often it is entirely wrong. For machine learning projects it is even worse. So much nonsense and errors from copying other people Python code....
Python like Basic abstracted far to many details away from students, and trying to convince people they need to know how a CPU works later is nearly impossible.
In general, digging deep enough down a stack, and it drops back into the gsl:
> I have seen C wizardry up close that I know I simply cannot do.
I have written C at least a few times per year for over 30 years. About ten years of that was OS development on Solaris and its derivatives.
Articles like this show crazy things you can do in C. I’ve never found the need to do things like this and have never seen them in the wild.
The places that wizardry is required are places like integer and buffer overflow, locking, overall structure of large codebases, build infrastructure, algorithms, etc. Many of these are concerns in most languages.
> auto reject C PR’s if they didn’t use the syntax if (1==x) rather than if (x==1)
When I was a student in the 90s advice like this would have been helpful. Compiler warnings and static analyzers are so much better now that tricks like this are not needed.
> I knew a guy that’d auto reject C PR’s if they didn’t use the syntax if (1==x) rather than if (x==1). The former will not compile if you accidentally use variable assignment instead of equality operator (which everyone has done at some point).
That's not so much of a footgun anymore - the common C compilers will warn you about that so there's not much point in defending against it.
Same with literal format string parameters to printf functions: the compiler is very good at warning about mismatched types.
That’s precisely where my little professional C experience was. I then switched to a python shop and was initially horrified at some conventions, took some getting used to.
It certainly could do though. In C, using an uninitialised variable does not mean "whatever that memory happened to have in it before" (although that is a potential result). Instead, it's undefined behaviour, so the compiler can do what it likes.
For example, it could well unconditionally initialise that memory to 123. Alternatively, it could notice that the whole snippet has undefined behaviour so simply replace it with no instructions, so it doesn't print anything at all. It could even optimise away the return that presumably follows that code in a function, so it ends up crashing or doing something random. It could even optimise away the instructions before that snippet, if it can prove that they would only be executed if followed by undefined behaviour – essentially the undefined behaviour can travel back in time!