Hacker News new | ask | show | jobs
by LegionMammal978 1178 days ago
Going off of the C17 numbering,

4/7: "A conforming program is one that is acceptable to a conforming implementation."

This definition has no restrictions regarding runtime requirements, unlike for strictly conforming programs.

5/1: "An implementation translates C source files and executes C programs in two data-processing-system environments, which will be called the translation environment and the execution environment in this International Standard. Their characteristics define and constrain the results of executing conforming C programs constructed according to the syntactic and semantic rules for conforming implementations."

So clause 5 binds all "conforming C programs constructed according to the syntactic and semantic rules for conforming implementations", not just strictly conforming programs.

Now, 4/3: "A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.3."

We can interpret this as saying that "a program that is correct in all other aspects... containing unspecified behavior" is "constructed according to the syntactic and semantic rules for conforming implementations" even if it only works when "operating on correct data".

From there, it does not seem very difficult to conclude that in general, a conforming program which contains fully specified behavior, assuming it operates on correct data, is also "constructed according to the syntactic and semantic rules for conforming implementations", and is therefore bound by clause 5. If we were to instead take the negation of this conclusion, that a program is not bound by clause 5 if any possible input data causes it to violate a runtime requirement, then the wording of 4/3 would not make any sense.

(In other words, every conforming program has a corresponding set of "correct input data", and it is correct and bound by clause 5 if it does not violate any runtime requirements when given any input data within that set. A program is only incorrect if that set is empty, i.e., the UB is unconditional.)

---

Meanwhile, I suppose you're looking at C++17. The note in [intro.execution]/4 is non-normative, and all of the normative language (e.g., on the very next paragraph) attaches runtime UB to the execution as a function of the input data, not the pure program.

[intro.compliance]/(2.1) and its (non-normative) footnote further clarify the distinction, stating, "If a program contains no violations of the rules in this International Standard, a conforming implementation shall, within its resource limits, accept and correctly execute that program.... 'Correct execution' can include undefined behavior, depending on the data being processed; see 1.3 and 1.9." This suggests that a program that executes undefined operations does not necessarily contain any rule violations.

1 comments

I think your confusion here is coming from the fact that "unspecified behavior" is a specific thing in the standard's terminology (looking specifically at n3088 right now), distinct from the concept of "undefined behavior". So when it says "constructed according to the semantic rules", that inherently excludes UB, which definitionally has no semantics prescribed for it (unlike unspecified behavior). For brevity's sake, I'm ignoring the allowance that an implementation can give any particular UB defined semantics.

To reduce my point to a list of options:

* No UB in the program -> Specified by 5.*

* No UB for certain inputs -> Specified for those inputs, not specified otherwise

* UB present, but not on any possible execution path -> Not specified (this is the argument)

* UB present on every possible execution path -> Not specified (definitionally)

I linked this in a sibling comment, but there was a proposal to amend C2x's wording here to specifically exclude this type of insanity (n2278), but it wasn't adopted because it could potentially prohibit optimizations and the working group doesn't really want to address the issue of undefined behavior with more definitions.

My claim is, "If a program violates the semantic rules at runtime given input A, but does not violate the semantic rules at runtime given input B, then the execution of the program given input B will be defined by clause 5." This is because the wording of 4/3 implies that "being constructed according to the semantic rules" is a function of both the program and the data it is given, such that the behavior can be defined on some inputs by clause 5, but undefined on other inputs.

As I understand it, your claim is, "If a program violates the semantic rules at runtime given input A, then the behavior of the program is undefined given input B, even if the program would not have violated the semantic rules at runtime given input B." Am I misunderstanding your claim?

> I linked this in a sibling comment, but there was a proposal to amend C2x's wording here to specifically exclude this type of insanity (n2278), but it wasn't adopted because it could potentially prohibit optimizations and the working group doesn't really want to address the issue of undefined behavior with more definitions.

N2278 seems entirely irrelevant to this question, of whether potential UB given one input can cause unexpected behavior given another input. Instead, it seems to say, "If the program violates the semantic rules at runtime given input A, causing UB, then the implementation is forbidden from making that behavior identical to the program's hypothetical behavior if it had been given another input B." That looks pretty unworkable in the general case. (E.g., must compilers operate as if an out-of-bounds write on one object can modify the value of a totally different object?)

I realized on thinking a bit more that two of the cases I mentioned are actually identical, so you're correct. Good to know going forward!