Hacker News new | ask | show | jobs
by copsarebastards 3896 days ago
> On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. --Charles Babbage

The modern version of this seems to be:

"Mr. Babbage, I put the wrong figures into the machine and the wrong answers came out! Please fix it this, this has security implications!"

You can't reasonably expect the compiler to make your insecure code secure.

Calling them "language lawyers" is some entitled crap. GCC commits to implement the specification of the language. Expecting them to maintain some huge number of undefined behaviors is literally expecting them to do something they never said they would do and couldn't do even if they said they would.

6 comments

The problem is that it's often perfectly clear, reasonable code on all the systems it was intended to run on. For example, on all Unix-like systems, pointer arithmetic is simply arithmetic and behaves like it. (C's predecessor didn't even have separate pointer and integer types.) So prior to compiler optimisations, this series of operations is safe and well-behaved on all architectures Linux supports even if a is NULL:

  int *b = &a->something; // pointer arithmetic, doesn't dereference a.
  if(a == NULL) return 0;
  else something_critical = a->somethingelse;
However, some non-Unix address models that Linux doesn't support don't permit pointer arithmetic on NULL pointers. So the ANSI C standards committee declared it undefined. Which means that gcc can - and eventually did - eliminate the NULL pointer check. This has resulted in privilege escalation vulnerabilities in Linux that didn't exist until gcc decided to optimise the code, some of them quite well-hidden.
I understand the problem, I'm saying that it's not GCC's problem. If you don't want undefined behavior, don't put undefined behavior in your code. The code you wrote isn't clear or reasonable, because it relies undefined behavior. It's a valid criticism that this code does appear to be straightforward when it isn't, but that's not a criticism of GCC, it's a criticism of ANSI C. If you don't like it, use a better language. C was designed 4 decades ago; and they can't possibly have forseen every problem that we've discovered in that time.
ANSI C didn't really do anything wrong here, though - they created a least-common-denominator spec of what you could reasonably expect from C across all platforms. Pointer arithmetic on NULL pointers had to be considered undefined (not just unspecified) in ANSI C, because on certain commercially-important proprietary systems it generated a hardware trap that caused the OS to kill your process. The problem is that the gcc developers insisted on actually making that code behave as undefined even though it didn't make sense to.

Also, I should note that a lot of code - particularly the Linux kernel - isn't actually using ANSI C anyway. They're using a superset of it with gcc extensions and they have a whole bunch of architecture-specific code too.

> If you don't want undefined behavior, don't put undefined behavior in your code.

I'd quip that this is statistically impossible for a sufficiently large codebase.

> it's a criticism of ANSI C. If you don't like it, use a better language.

This is my basic stance. However, if I'm e.g. in a situation where I have a C or C++ codebase I can't afford to rewrite from scratch, I'd like to use a "Better C" compiler, where "Better C" is a slightly less bad version of "ANSI C" - some undefined behavior removed, for example.

As shorthand, I'll generally refer to compilers for "Better C" as "Good C Compilers".

GCC is not trying to be a Good C Compiler. They've decided these things aren't their problem. Which is... fair. That's their choice. I do not for one minute pretend to understand that choice however - and it gives me yet one more reason to switch to a Good C Compiler.

Good luck with that. I suspect the only good C is not C.
I don't disagree - but there's value in harm reduction, no?
I'm reminded of http://mjg59.livejournal.com/108257.html :

"POSIX says this is fine, so any application that expected this behaviour is already broken by definition. But this is rules lawyering. POSIX says that many things that are not useful are fine, but doesn't exist for the pleasure of sadistic OS implementors. POSIX exists to allow application writers to write useful applications. If you interpret POSIX in such a way that gains you some benefit but shafts a large number of application writers then people are going to be reluctant to use your code. You're no longer a general purpose filesystem - you're a filesystem that's only suitable for people who write code with the expectation that their OS developers are actively trying to fuck them over."

GCC doesn't, or at least shouldn't, exist to implement the ANSI standard. It exists to help people to write useful programs.

And it does that by implementing a compiler for a language with existing standard, which is ANSI. If you don't like the ANSI C standard (and I admit it's not perfect), don't use a compiler for ANSI C.

Also, this is not just GCC problem, all the existing C compilers have the issue to some extent. After all, STACK (the MIT tool to detect undefined behavior) is based on clang. And ICC exploits the same UB tricks AFAIK.

> And it does that by implementing a compiler for a language with existing standard, which is ANSI. If you don't like the ANSI C standard (and I admit it's not perfect), don't use a compiler for ANSI C.

I'm not arguing that GCC should violate the ANSI standard; rather it should provide additional guarantees above the what ANSI requires (which was always the intent of the standard; the standard defines the absolute minimum that cross-platform programs can depend on, the reason so much is undefined is to allow compilers to have their own strategies for what should happen in those cases, not to require that compilers blow up in those cases). Honestly I think the ANSI side of things is a red herring; when given the option of some change that will slightly improve performance on some benchmarks, but make a lot of user code silently fail, a responsible developer should know to reject that change whether or not that change violates some standard.

> Also, this is not just GCC problem, all the existing C compilers have the issue to some extent.

The post is claiming that GCC is the worst of them. Certainly my impression is that clang is substantially less aggressive at exploiting UB; I don't know ICC well enough to comment.

> I'm not arguing that GCC should violate the ANSI standard; rather it should provide additional guarantees above the what ANSI requires ...

The problem with that approach is that it introduces dependency on the compiler. The original code was ANSI C and thus should compile fine on all compilers compatible with ANSI C, the new code is not as each compiler will decide to handle undefined behavior differently. Either you'll make the exact compiler a hard dependency (i.e. it always has to be compiled with gcc and fails to build with everything else), or it will produce "correct" binaries on some compilers and "incorrect" binaries on others. That's hardly an improvement.

The only way out of this is either to abandon C and use a language with stronger guarantees, or make the ANSI C more strict by adding the guarantees to the standard. Which is not going to happen, I guess.

> The post is claiming that GCC is the worst of them. Certainly my impression is that clang is substantially less aggressive at exploiting UB; I don't know ICC well enough to comment.

GCC is also the most widely, so people tend to spot issues more often.

All this "problem" is a direct consequence of using C without really understanding what guarantees it does and does not provide, and instead driving by a simplified model of the environment. And then getting angry that the simplified model is not really correct.

> The original code was ANSI C and thus should compile fine on all compilers compatible with ANSI C, the new code is not as each compiler will decide to handle undefined behavior differently.

Except 40% of the original code already wasn't ANSI C.

> Either you'll make the exact compiler a hard dependency (i.e. it always has to be compiled with gcc and fails to build with everything else), or it will produce "correct" binaries on some compilers and "incorrect" binaries on others. That's hardly an improvement.

Having code that was broken under GCC not be broken under GCC absolutely is an improvement, particularly since in fact this kind of code often works on every other extant compiler.

> make the ANSI C more strict by adding the guarantees to the standard. Which is not going to happen, I guess.

Standards tend to codify existing practice. There's no reason the standard couldn't be made stricter - but the way we get to there from here is if the major compilers implement stricter restrictions and can show that they can be implemented consistently and users find them useful. GCC has been willing to do that kind of innovation for other parts of the standard.

> Except 40% of the original code already wasn't ANSI C.

Then why complain that ANSI C compiler gets confused by it?

> Having code that was broken under GCC not be broken under GCC absolutely is an improvement, particularly since in fact this kind of code often works on every other extant compiler.

No, the code does not work on every other compiler. And if it is, there's no guarantee it will stay like that.

> Standards tend to codify existing practice. There's no reason the standard couldn't be made stricter - but the way we get to there from here is if the major compilers implement stricter restrictions and can show that they can be implemented consistently and users find them useful. GCC has been willing to do that kind of innovation for other parts of the standard.

AFAIK some of the limitations are there because of non-traditional platforms - some of them may be a thing of the past so removing them would be OK, but some are not (and thus won't be removed from the standard). And one of the points of ANSI C (and POSIX) is to define global guarantees, not per-platform ones.

Compilers will do things like remove a memset clearing a chunk of memory to zero (because it detects that the variable isn't read again). That sort of thing is bad for security.
Writing code that depends on the value of unread memory is bad for security.
I think you misunderstood the example -- the memory is cleared after use to ensure that if it's reallocated by someone else, or someone hooks up a debugger, the content can't be examined (except when the compiler removes this clearing attempt because of an optimization). Lets say that chunk of memory held a password -- you'd definitely want to clear it after use, even if you immediately free it and never plan to read it again.
That's actually a very good example, but I'd argue that this is actually a violation of the standard: memset is defined as setting the value in memory. Most optimizations on undefined behavior don't really fall into this category.

I guess you could group this kind of thing into the category of "dead code elimination" which is useful, but results in parts of the code written not producing the specified executable. I have to think on this example more.

It is allowed under the "as if" rule. If no visible aspects of the program are changed by an optimization, then it is allowed. The value stored in memory is not considered to be a visible aspect, and so the compiler is allowed to modify which memory is changed.

It's the same as inlining a function. The standard says that a function call is a function call. Compilers are still allowed to inline the call, even if it has not been specifically marked as "inline".

I absolutely agree with this. GCC implements ANSI C, and that unfortunately includes undefined behavior for various reasons. The problem with undefined behavior is that it's, well, undefined. Different compilers might choose different things, because different developers have different mental models of "what makes sense" in various situations. Which is hardly an improvement. Also, it wouldn't be ANSI C but some unknown mutation of C.
I believe the complaint is actually that the compiler makes secure code insecure by removing checks that rely on undefined behavior (which presumably can't be made any other way).
That complaint isn't a valid complaint. If the checks relied on undefined behavior, the code wasn't secure. If you want to rely on the behavior of a specific version of a specific compiler, then you need to define that in your dependencies instead of pretending that you've written general-purpose C code. This isn't even just a GCC problem; compiling the code on a different compiler breaks this too.
Yes, I'm really happy that GCC optimize that code away.

Most of us don't care about security issue too much when using C/C++. We do use it for performance, and use it mostly locally.

GCC is a very versatile code. It's ok that it makes secure code difficult to write because it's not what most of us is doing. Not being completely secure is ok, not being optimized is not.

Yeah, I'd go so far as to argue that if you need security, you probably shouldn't be writing C.
What you and a lot of other people are missing, cheerful in your use of other languages, is that your runtimes and native extensions usually depend on insecure C code.
I'm not missing that; you're making an assumption.
It's not GCC's job to break my program just in case I might one day run it on a broken compiler. That's like the fire marshal burning down my house to demonstrate how it violates fire codes.

There's also nothing wrong with writing a C program that rests on a base of POSIX, or the GNU system, and requires stronger guarantees than C alone provides.

What GCC is doing subverts the purpose of writing in C, which is to get close to the machine and instruct its processor to do certain things. Optimizations are useful when they allow the compiler to better express our intent, but lately, we've seen compilers more and more often ignore our plain, stated intent, then back it up with a reference to a specification that wasn't intended to allow this subversion.
> What GCC is doing subverts the purpose of writing in C, which is to get close to the machine and instruct its processor to do certain things.

That's not the purpose of writing in C. It's a goal you might be able to achieve with C, but I'm not sure why that's your goal, and it's certainly not the goal of everyone who writes C. I think more people who write C do so with the coal of producing programs that run fast or with a minimal memory fingerprint.