Hacker News new | ask | show | jobs
by clarry 3294 days ago
> If the code really does make no sense, then why not disallow it or capture it with a compiler warning?

You answered your own question:

> Isn't it possible that the programmer knows something that the optimizer is unaware of?

Here's a fun exercise. Find the best static analyzer you can. Run it on some substantial body of C code. Count all the false positives and false negatives (good luck).

Now you should have an idea as to why you can't simply require the compiler to reject the code and issue a diagnostic. But making code like this UB is about as close you can get to "disallowing it."

Alternatively you could try to eliminate the problem entirely by removing the whole concept of uninitialized memory and requiring that all variables are initialized to some value per default. Depending on which camp you're in, this is a step forward or a step back. Some people just see the words UB and think it is Satan, bad bad bad. For them this is a step forward.

But if all variables are supposed to have a definitive starting value, compilers & static analyzers suddenly can't warn you about the cases where it can tell you forgot to initialize something, because all of a sudden such code is totally legit. The analyzer can't tell whether you forgot to make an initialization or whether you're actually intentionally relying on the default value.

That's a big step back as far as I am concerned. You eliminate a problem because "UB is bad!" and create another problem we can't even issue warnings for, without potentially generating loads of false positives.

Now if one wanted to complain that his compiler detects an obvious case of UB and doesn't warn him about it, he should take it up with the compiler developers (or see if he can help himself by turning on the right flags). Alternatively one could invest in a good static analyzer. The standard committee can't really fix this problem without making substantial changes to the language. Instead I'm glad they allow relatively simplistic implementations that don't do deep advanced analysis.

2 comments

C# is an example of a language that decided that including definite assignment analysis into the language spec was worth the tradeoff.
> Now you should have an idea as to why you can't simply require the compiler to reject the code and issue a diagnostic.

It is quite a different beast, but C# produces compile time failures when attempting to read from a potentially uninitialised variable. What is it that makes C different?

Emphasis on potentially. I did not mean to imply that it is technically impossible to "amend" the standard in the C# direction (but complete analysis with no false positives and no false negatives is impossible!).

However, requiring complicated (yet inherently incomplete, in the sense that it must have "unknowns") data flow analysis and then requiring the compiler to reject plausibly correct programs would be a massive departure from the guiding principles of C standardisation.

In particular, and I quote from the C99 rationale:

> Existing code is important

Requiring correct existing programs to be rejected is not compatible with this principle.

> Keep the spirit of C. [..] Some of the facets of the spirit of C can be summarized in phrases like:

> Trust the programmer.

> Don't prevent the programmer from doing what needs to be done.

> Keep the language small and simple.

> Make it fast, even if it is not guaranteed to be portable.

Complicated mandatory analysis that rejects correct programs pretty much violates every single one of those facets: don't trust the programmer, assume his program is wrong if you can't prove it right. Reject correct programs and in doing so, get in the way of the programmer trying to do what needs to be done. Complicated analysis is not making the language smaller and simpler. Dealing with the shortcomings of said analysis may require unnecessary code (initializations) that are nothing but a performance loss.

> Codify existing practice to address evident deficiencies.

The C standards have always had a focus on codifying existing practice. What are the C implementations that have this practice you propose today?

> Minimize incompatibilities with C90 (ISO/IEC 9899:1990). It should be possible for existing C implementations to gradually migrate to future conformance, rather than requiring a replacement of the environment. It should also be possible for the vast majority of existing conforming programs to run unchanged.

Earlier it was stated that existing implementations are not important, in contrast to existing code. However, in light of this paragraph, it is not supposed to be taken to its logical extreme. Definite assignment analysis can require potentially major changes to existing implementations.

> Maintain conceptual simplicity. The Committee prefers an economy of concepts that do the job. Members should identify the issues and prescribe the minimal amount of machinery that will solve the problems. The Committee recognizes the importance of being able to describe and each new concepts in a straightforward and concise manner.

Interpret this how you will. Having spent a lot of time reading the C spec drafts, I would argue that describing the equations for assignment analysis would look rather complicated and out of place. Because C just doesn't have that sort of complicated machinery.

Basically, this would require substantial changes as I said in my previous post. And these changes would not be in line with the spirit of C.

Why do people want to strong-arm C into being something that is not? We use C because of what it is, not becuase of what you want it to be. If you want a new language, with a different spirit and different guiding principles, why don't you go use D or C++ or Rust or whatever, or invent a new language? I don't see a reason to break C to please all the C haters who wouldn't use the language anyway.

> In particular, and I quote from the C99 rationale

Maybe you should read the C11 principles instead, and I quote:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2021.htm

"12. Trust the programmer, as a goal, is outdated in respect to the security and safety programming communities. While it should not be totally disregarded as a facet of the spirit of C, the C11 version of the C Standard should take into account that programmers need the ability to check their work."

I think it's a stretch to read programmers need the ability to check their work as don't trust the programmer.

They codified interfaces with bounds checking. They gave us tools. They deprecated a function that cannot be checked and used securely inside a program without also checking what happens outside the program.

That's good, they give the ability, they give tools. But that's not at all distrusting the programmer, least of all in the sense that I spoke of: assume the program is wrong if you can't prove it right.

And because of that, I do not think these updated guidelines are a reversal of spirit or incompatible with what C alway was and still is.

They still trust the programmer in general, and also allow the use of older interfaces that do no bounds checking. They just gave new toys for the programmer who doesn't trust himself. That may help him check the work.