Hacker News new | ask | show | jobs
by GedByrne 3294 days ago
The problem is that the language allows this code and the compiler accepts it.

If the code really does make no sense, then why not disallow it or capture it with a compiler warning?

Instead the optimizer is to silently remove the code.

Isn't it possible that the programmer knows something that the optimizer is unaware of?

2 comments

> If the code really does make no sense, then why not disallow it or capture it with a compiler warning?

You answered your own question:

> Isn't it possible that the programmer knows something that the optimizer is unaware of?

Here's a fun exercise. Find the best static analyzer you can. Run it on some substantial body of C code. Count all the false positives and false negatives (good luck).

Now you should have an idea as to why you can't simply require the compiler to reject the code and issue a diagnostic. But making code like this UB is about as close you can get to "disallowing it."

Alternatively you could try to eliminate the problem entirely by removing the whole concept of uninitialized memory and requiring that all variables are initialized to some value per default. Depending on which camp you're in, this is a step forward or a step back. Some people just see the words UB and think it is Satan, bad bad bad. For them this is a step forward.

But if all variables are supposed to have a definitive starting value, compilers & static analyzers suddenly can't warn you about the cases where it can tell you forgot to initialize something, because all of a sudden such code is totally legit. The analyzer can't tell whether you forgot to make an initialization or whether you're actually intentionally relying on the default value.

That's a big step back as far as I am concerned. You eliminate a problem because "UB is bad!" and create another problem we can't even issue warnings for, without potentially generating loads of false positives.

Now if one wanted to complain that his compiler detects an obvious case of UB and doesn't warn him about it, he should take it up with the compiler developers (or see if he can help himself by turning on the right flags). Alternatively one could invest in a good static analyzer. The standard committee can't really fix this problem without making substantial changes to the language. Instead I'm glad they allow relatively simplistic implementations that don't do deep advanced analysis.

C# is an example of a language that decided that including definite assignment analysis into the language spec was worth the tradeoff.
> Now you should have an idea as to why you can't simply require the compiler to reject the code and issue a diagnostic.

It is quite a different beast, but C# produces compile time failures when attempting to read from a potentially uninitialised variable. What is it that makes C different?

Emphasis on potentially. I did not mean to imply that it is technically impossible to "amend" the standard in the C# direction (but complete analysis with no false positives and no false negatives is impossible!).

However, requiring complicated (yet inherently incomplete, in the sense that it must have "unknowns") data flow analysis and then requiring the compiler to reject plausibly correct programs would be a massive departure from the guiding principles of C standardisation.

In particular, and I quote from the C99 rationale:

> Existing code is important

Requiring correct existing programs to be rejected is not compatible with this principle.

> Keep the spirit of C. [..] Some of the facets of the spirit of C can be summarized in phrases like:

> Trust the programmer.

> Don't prevent the programmer from doing what needs to be done.

> Keep the language small and simple.

> Make it fast, even if it is not guaranteed to be portable.

Complicated mandatory analysis that rejects correct programs pretty much violates every single one of those facets: don't trust the programmer, assume his program is wrong if you can't prove it right. Reject correct programs and in doing so, get in the way of the programmer trying to do what needs to be done. Complicated analysis is not making the language smaller and simpler. Dealing with the shortcomings of said analysis may require unnecessary code (initializations) that are nothing but a performance loss.

> Codify existing practice to address evident deficiencies.

The C standards have always had a focus on codifying existing practice. What are the C implementations that have this practice you propose today?

> Minimize incompatibilities with C90 (ISO/IEC 9899:1990). It should be possible for existing C implementations to gradually migrate to future conformance, rather than requiring a replacement of the environment. It should also be possible for the vast majority of existing conforming programs to run unchanged.

Earlier it was stated that existing implementations are not important, in contrast to existing code. However, in light of this paragraph, it is not supposed to be taken to its logical extreme. Definite assignment analysis can require potentially major changes to existing implementations.

> Maintain conceptual simplicity. The Committee prefers an economy of concepts that do the job. Members should identify the issues and prescribe the minimal amount of machinery that will solve the problems. The Committee recognizes the importance of being able to describe and each new concepts in a straightforward and concise manner.

Interpret this how you will. Having spent a lot of time reading the C spec drafts, I would argue that describing the equations for assignment analysis would look rather complicated and out of place. Because C just doesn't have that sort of complicated machinery.

Basically, this would require substantial changes as I said in my previous post. And these changes would not be in line with the spirit of C.

Why do people want to strong-arm C into being something that is not? We use C because of what it is, not becuase of what you want it to be. If you want a new language, with a different spirit and different guiding principles, why don't you go use D or C++ or Rust or whatever, or invent a new language? I don't see a reason to break C to please all the C haters who wouldn't use the language anyway.

> In particular, and I quote from the C99 rationale

Maybe you should read the C11 principles instead, and I quote:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2021.htm

"12. Trust the programmer, as a goal, is outdated in respect to the security and safety programming communities. While it should not be totally disregarded as a facet of the spirit of C, the C11 version of the C Standard should take into account that programmers need the ability to check their work."

I think it's a stretch to read programmers need the ability to check their work as don't trust the programmer.

They codified interfaces with bounds checking. They gave us tools. They deprecated a function that cannot be checked and used securely inside a program without also checking what happens outside the program.

That's good, they give the ability, they give tools. But that's not at all distrusting the programmer, least of all in the sense that I spoke of: assume the program is wrong if you can't prove it right.

And because of that, I do not think these updated guidelines are a reversal of spirit or incompatible with what C alway was and still is.

They still trust the programmer in general, and also allow the use of older interfaces that do no bounds checking. They just gave new toys for the programmer who doesn't trust himself. That may help him check the work.

The standard can't mandate a diagnostic for undefined behaviour, because it's not always possible to determine if behaviour is undefined. (Turing completeness) However, an implementation is allowed to issue any diagnostics it likes, so if you want one for this case, bug your compiler vendor to add one. Or, check the manual - it's possible that your compiler already does issue a diagnostic if you turn the right warnings on.
> The standard can't mandate a diagnostic for undefined behaviour, because it's not always possible to determine if behaviour is undefined.

Yet that argument is stupid if we are talking about compilers issuing warnings for the UB they "detect", because by definition they have detected them. It might not be possible when it does not work exactly like that in some cases, but at least before removing some code, this is desirable.

It might also not be easy given the current internal design of compilers, but then I argue those design should be changed, because it is just too dangerous.

You are free to pave the way.

But pretending that compiler developers are currently just ignoring the issue out of mischief and preference for performance above anything else is disingenous. If the problem were so simple, we'd probably already have a dozen free static analyzers that do a good job, and you'd be happy to use them.

The thing is, "detecting" UB in the sense you imply is usually not what happens in the context of said optimizations (that's not to say they won't attempt to do it... ever seen a compiler warn you about use of uninitialized variable?).

What compilers do is they assume the program is correct. And following that assumption, they perform an optimization that is only correct for correct programs. That is in fact very simple to do. They do not try to prove or disprove that program actually invokes UB there -- that is impossible in general, and even in the subset of cases where it is possible it could require deep whole-program (including libraries!) analysis that could take massive computational resources.

Many people here keep trivializing the problem but I don't think they understand the problem at any depth.

And that is why I say you should pave the way, not in a smug "fuck off get off my lawn fix your own problem" sense, but to get people to honestly gain some background in program analysis, read research papers, study existing analyzers, and gain some appreciation for what it takes. It is far, far from trivial. Especially if you can't just take the language and change it to your liking (breaking nearly all existing code) until all the hard stuff is out.

And in saying that, I suggest that it is easier to start with a new language (or, at least some existing language other than C) that was designed from ground up with such analysis & correctness provability in mind.

I might actually end up doing that.

But honestly what a waste of resources. I would prefer to be able to work on topics that are not problems created by a recent shift of view of how compiler should be designed and optimize, shift that is for the worst IMO.

And yep I know the reason in the current design for why it is not always easy. That's why I added the caveat that maybe they should be changed.

But then you can not just says that all the users criticizing and "trivializing" the problem are in the wrong and do not know what they are talking about. First, most of the discussion I've seen either points at technical justification of why those crazy optim should not be done, be it for safety, possibility of real optimizations by the programmers (and to compare, I'm not impress about the capability of compilers to take poor code and to transform it to somehow less poor code, especially when the language in question is C, that always required mastering); or points to discussions of compiler authors that are borderline insane (when you see them discussion about how technically the standards would allow to not consider uint8_t and char as alias, a

I might actually end up doing that.

But honestly what a waste of resources. I would prefer to be able to work on topics that are not problems created by a recent shift of view of how compiler should be designed and optimize, shift that is for the worst IMO.

And yep I know the reason in the current design for why it is not always easy. That's why I added the caveat that maybe they should be changed.

But then you can not just says that all the users criticizing and "trivializing" the problem are in the wrong and do not know what they are talking about. First, most of the discussion I've seen either points at technical justification of why those crazy optim should not be done, be it for safety, possibility of real optimizations by the programmers (and to compare, I'm not impress about the capability of compilers to take poor code and to transform it to somehow less poor code, especially when the language in question is C, that always required mastering); or points to discussions of compiler authors who are borderline insane (when you see them discussion about how technically the standards would allow to not consider uint8_t and char as alias, and they ask themselves if they could not "optimize" more thanks to that, you clearly understand they have completely lost it, and that things will end up badly); or points to bugs "exposed" by that class of optims (I would even say introduced, because 1) when your binary have been OK on all your target platforms for a few decades, and a new compiler come and break it for a technicality, it can be argued that regardless of said technicality the problem is mainly with the compiler; and 2) compilers compromise e.g. for technically invalid benchmarks, so they have no credibility when they don't take into account major breakage in major software, they are just acting like spoiled child who just want to do what they want regardless of the consequences).

What is never addressed by C/C++ compiler authors and their fan base is why the "nonportable" part of the standard is not taken into consideration anymore. This was the original spirit of UB. NOT: "this will permit EXTRA optimization in abstract compiler code"; the only "optim" were because of differences between target processors, and leaving things UB could yield a better mapping to the instruction sets (a trivial mapping) and it was considered that the programmer knew about them, and could use them when needed.

Now there is no mapping anymore. We all have to target the common denominator of all the existing target past, present, even dead ones. The result is a very very very poor language. Unsuitable to more and more things.