Hacker News new | ask | show | jobs
by kazinator 305 days ago
> I can't think of any interpretation that makes sense

Start with a concrete example. A header that is not in our program, or described in ISO C. How about:

  #include <winkle.h>
Defined behavior or not? How can an implementation respond to this #include while remaining conforming? What are the limits on that response?

> But header files do not have to have any particular correspondence to translation units.

A header inclusion is just a mechanism that brings preprocessor tokens into a translation unit. So, what does the standard tell us about the tokens coming from #include <winkle.h> into whatever translation unit we put it into?

Say we have a single file program and we made that the first line. Without that include, it's a standard-conforming Hello World.

2 comments

I think we are slowly getting closer to the crux of the matter. Are you saying that it's a problem to include files from a library since they are "not in our program"? What does that phrase actually mean? What is the bounds of "our program" anyway? Couldn't it be the set {main.c, winkle.h}
> What is the bounds of our program?

N3220: 5.1.1.1 Program Structure

A C program is not required to be translated in its entirety at the same time. The text of the program is kept in units called source files, (or preprocessing files) in this document. A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.

> Couldn't it be the set {main.c, winkle.h}

No; in this discussion it is important that <winkle.h> is understood not to be part of the program; no such header is among the files presented for translation, linking and execution. Thus, if the implementation doesn't resolve #include <winkle.h> we get the uninteresting situation that a constraint is violated.

Let's focus on the situation where it so happens that #include <winkle.h> does resolve to something in the implementation.

The bit of the standard that you've quoted says that the program consists of all files that are compiled into it, including all files that are found by the #include directive. So, if <winkle.h> does successfully resolve to something, then it must be part of the program by definition because that's what "the program" means.

Your question about an include file that isn't part of the program just doesn't make any sense.

(Technically it says that those files together make up the "program text". As my other comment says, "program" is the binary output.)

I see what you are getting at. Programs consist of materials that are presented to the implementation, and also of materials that come from the implementation.

So what I mean is that no file matching <winkle.h> has been presented as part of the external file set given to the implementation for processsing.

I agree that if such a file is found by the implementation it becomes part of the program, as makes sese and as that word is defined by ISO C, so it is not right terminology to say that the file is not part of the program, yet may be found.

If the inclusion is successful, though, the content of that portion of that program is not defined by ISO C.

It still seems like you have invented some notion of "program" that doesn't really exist. Most suspicious is when you say this:

> So what I mean is that no file matching <winkle.h> has been presented as part of the external file set given to the implementation for processsing.

The thing is, there is no "external file set" that includes header files, so this sentence makes no sense.

Note that when the preprocessor is run, the only inputs are the file being preprocessed (i.e., the .c file) and the list of directories to find include files (called the include path). That's not really part of the ISO standard, but it's almost universal in practice. Then the output of the preprocessor is passed to the compiler, and now it's all one flat file so there isn't even a concept of included files at this point. The object files from compilation are then passed to the linker, which again doesn't care about headers (or indeed the top-level source files). There are more details in practice (especially with libraries) but that's the essence.

I wonder if your confusion is based on seeing header files in some sort of project-like structure in an IDE (like Visual Studio). But those are just there for ease of editing - the compiler (/preprocessor) doesn't know or care which header files are in your IDE's project, it only cares about the directories in the include path. The same applies to CMake targets: you can add include files with target_sources(), but that's just to make them show up in any generated IDE projects; it has no effect on compilation.

Or are you just maybe saying that the developer's file system isn't part of the ISO C standard, so this whole textual inclusion process is by some meaning not defined by the standard? If so, I don't think that matches the conventional meaning of undefined behaviour.

If it's neither of those, could you clarify what exactly you mean by "the external file set given to the implementation for processing"?

Let's drop the word "program" and use something else, like "project", since the word "program" is normative in ISO C.

The "project" is all the files going into a program supplied other than by the implementation.

C programs can contain #include directives. Those #include directives can be satisfied in one of three ways: they can reference a standard header which is specified by ISO C and hence effectively built into the hosted language, such as <stdio.h>.

C programs can #include a file from the project. For instance someone's "stack.c" includes "stack.h". So yes, there is an external file set (the project) which can have header files.

C programs can also #include something which is neither of the above. That something might be not found (constraint violation). Or it might be found (the implementation provides it). For instance <sys/mmap.h>: not in your project, not in ISO C.

My fictitious <winkle.h> falls into this category. (It deliberately doesn't look like a common platform-specific header coming from any well-known implementation---but that doesn't matter to the point).

> Or are you just maybe saying that the developer's file system isn't part of the ISO C standard, so this whole textual inclusion process is by some meaning not defined by the standard?

Of course, it isn't, no I'm not saying that. The C standard gives requirements as to how a program (project part and other) is processed by the implementation, including all the translation phases that include preprocessing.

To understand what the requirements are, we must consider the content of the program. We know what the content is of the project parts: that's in our files. We (usually indirectly) know the content of the standard headers, from the standard; we ensure that we have met the rules regarding their correct use and what we may or may not rely on coming form them.

We don't know the content of successfully included headers that don't come from our project or from ISO C; or, rather, we don't know that content just from knowing ISO C and our project. In ISO C, we can't find any requirements as to what is supposed to be there, and we can't find it in our project either.

If we peek into the implementation to see what #include <winkle.h> is doing (and such a peeking is usually possible), we are effectively looking at a document, and then if we infer from that document what the behavior will be, it is a documented extension --- standing in the same place as what ISO C calls undefined behavior. Alternatively, we could look to actual documentation. E.g. POSIX tells us what is in <fcntl.h> without us having to look for the file and analyze the tokens. When we use it we have "POSIX-defined" behavior.

#include <winkle.h> is in the same category of thing as __asm__ __volatile__ or __int128_t or what have you.

#include <winkle.h> could contain the token __wipe_current_directory_at_compile_time which the accompanying compiler understands and executes as soon as it parses the token. Or __make_demons_fly_out_of_nose. :)

Do you see the point? When you include a nonstandard header that is not coming from your project, and the include succeeds, anything can happen. ISO C no longer dictates the requirements as to what the behavior will be. Something unexpected can happen, still at translation time.

Now headers like <windows.h> or <unistd.h> are exactly like <winkle.h>: same undefined behavior.

Do you just meant an attempt to include a file path that couldn't be found? That's not a correct usage of the term "program" – that refers to the binary output of the compilation process, whereas you're taking about the source files that are the input to the compilation. That sounds a bit pedantic but I really didn't understand what you meant.

I just checked, and if you attempt to include a file that cannot be found (in the include path, though it doesn't use that exact term) then that's a constraint violation and the compiler is required to stop compilation and issue a diagnostic. Not undefined behaviour.

Yes; we are more interested in the other case: it happens to be found.

What are the requirements then?

I don't get your point then. If the file is found then there is no undefined behaviour in the process of the file being included. There might be undefined behaviour in the overall translation unit after the text has been substituted in, but that's nothing to do with the preprocessor.
> If the file is found then there is no undefined behaviour in the process of the file being included.

Correct; but processing doesn't stop there.

> There might be undefined behaviour in the overall translation unit

But what does that mean; how do you infer that there might be undefined behavior?

Does ISO C define the behavior, or does it not?

ISO C has nothing to say about what is in #include <winkle.h> if such a header is found and didn't come from the program.

Without having anything to say about what is in it, if it is found at all, ISO C cannot be giving a definition of behavior of the tokens that are substituted for that #include.