Hacker News new | ask | show | jobs
by quietbritishjim 306 days ago
> Including a header that is not in the program, and not in ISO C, is undefined behavior.

What is this supposed to mean? I can't think of any interpretation that makes sense.

I think ISO C defines the executable program to be something like the compiled translation units linked together. But header files do not have to have any particular correspondence to translation units. For example, a header might declare functions whose definitions are spread across multiple translation units, or define things that don't need any definitions in particular translation units (e.g. enum or struct definitions). It could even play macro tricks which means it declares or defines different things each time you include it.

Maybe you mean it's undefined behaviour to include a header file that declares functions that are not defined in any translation unit. I'm not sure even that is true, so long as you don't use those functions. It's definitely not true in C++, where it's only a problem (not sure if it's undefined exactly) if you ODR-rule use a function that has been declared but not defined anywhere. (Examples of ODR-rule use are calling or taking the address of the function, but not, for example, using sizeof on an expression that includes it.)

1 comments

> I can't think of any interpretation that makes sense

Start with a concrete example. A header that is not in our program, or described in ISO C. How about:

  #include <winkle.h>
Defined behavior or not? How can an implementation respond to this #include while remaining conforming? What are the limits on that response?

> But header files do not have to have any particular correspondence to translation units.

A header inclusion is just a mechanism that brings preprocessor tokens into a translation unit. So, what does the standard tell us about the tokens coming from #include <winkle.h> into whatever translation unit we put it into?

Say we have a single file program and we made that the first line. Without that include, it's a standard-conforming Hello World.

I think we are slowly getting closer to the crux of the matter. Are you saying that it's a problem to include files from a library since they are "not in our program"? What does that phrase actually mean? What is the bounds of "our program" anyway? Couldn't it be the set {main.c, winkle.h}
> What is the bounds of our program?

N3220: 5.1.1.1 Program Structure

A C program is not required to be translated in its entirety at the same time. The text of the program is kept in units called source files, (or preprocessing files) in this document. A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.

> Couldn't it be the set {main.c, winkle.h}

No; in this discussion it is important that <winkle.h> is understood not to be part of the program; no such header is among the files presented for translation, linking and execution. Thus, if the implementation doesn't resolve #include <winkle.h> we get the uninteresting situation that a constraint is violated.

Let's focus on the situation where it so happens that #include <winkle.h> does resolve to something in the implementation.

The bit of the standard that you've quoted says that the program consists of all files that are compiled into it, including all files that are found by the #include directive. So, if <winkle.h> does successfully resolve to something, then it must be part of the program by definition because that's what "the program" means.

Your question about an include file that isn't part of the program just doesn't make any sense.

(Technically it says that those files together make up the "program text". As my other comment says, "program" is the binary output.)

I see what you are getting at. Programs consist of materials that are presented to the implementation, and also of materials that come from the implementation.

So what I mean is that no file matching <winkle.h> has been presented as part of the external file set given to the implementation for processsing.

I agree that if such a file is found by the implementation it becomes part of the program, as makes sese and as that word is defined by ISO C, so it is not right terminology to say that the file is not part of the program, yet may be found.

If the inclusion is successful, though, the content of that portion of that program is not defined by ISO C.

It still seems like you have invented some notion of "program" that doesn't really exist. Most suspicious is when you say this:

> So what I mean is that no file matching <winkle.h> has been presented as part of the external file set given to the implementation for processsing.

The thing is, there is no "external file set" that includes header files, so this sentence makes no sense.

Note that when the preprocessor is run, the only inputs are the file being preprocessed (i.e., the .c file) and the list of directories to find include files (called the include path). That's not really part of the ISO standard, but it's almost universal in practice. Then the output of the preprocessor is passed to the compiler, and now it's all one flat file so there isn't even a concept of included files at this point. The object files from compilation are then passed to the linker, which again doesn't care about headers (or indeed the top-level source files). There are more details in practice (especially with libraries) but that's the essence.

I wonder if your confusion is based on seeing header files in some sort of project-like structure in an IDE (like Visual Studio). But those are just there for ease of editing - the compiler (/preprocessor) doesn't know or care which header files are in your IDE's project, it only cares about the directories in the include path. The same applies to CMake targets: you can add include files with target_sources(), but that's just to make them show up in any generated IDE projects; it has no effect on compilation.

Or are you just maybe saying that the developer's file system isn't part of the ISO C standard, so this whole textual inclusion process is by some meaning not defined by the standard? If so, I don't think that matches the conventional meaning of undefined behaviour.

If it's neither of those, could you clarify what exactly you mean by "the external file set given to the implementation for processing"?

Do you just meant an attempt to include a file path that couldn't be found? That's not a correct usage of the term "program" – that refers to the binary output of the compilation process, whereas you're taking about the source files that are the input to the compilation. That sounds a bit pedantic but I really didn't understand what you meant.

I just checked, and if you attempt to include a file that cannot be found (in the include path, though it doesn't use that exact term) then that's a constraint violation and the compiler is required to stop compilation and issue a diagnostic. Not undefined behaviour.

Yes; we are more interested in the other case: it happens to be found.

What are the requirements then?

I don't get your point then. If the file is found then there is no undefined behaviour in the process of the file being included. There might be undefined behaviour in the overall translation unit after the text has been substituted in, but that's nothing to do with the preprocessor.
> If the file is found then there is no undefined behaviour in the process of the file being included.

Correct; but processing doesn't stop there.

> There might be undefined behaviour in the overall translation unit

But what does that mean; how do you infer that there might be undefined behavior?

Does ISO C define the behavior, or does it not?

ISO C has nothing to say about what is in #include <winkle.h> if such a header is found and didn't come from the program.

Without having anything to say about what is in it, if it is found at all, ISO C cannot be giving a definition of behavior of the tokens that are substituted for that #include.