Hacker News new | ask | show | jobs
by bcoates 4892 days ago
Hey, don't lump C++ in with this. If you write code in the STL weenie style or the Pretend It's Java style there aren't any idioms I know of that would ever violate the rules he mentions (out-of-range pointers, signed overflow, invalid aliasing). I don't do those things and the C++ programmers I work with don't do those things, at least not habitually. I don't see violations of undefined behavior rules, or the use of idioms that come close to it, very often in our code. Not nearly as often as the sort of mundane errors that no language can prevent.

These are not problems of a language per se, but the original sins of neo-vaxocentrism and confusing "I understand how this might work, at some random abstraction layer" and "I can depend on what happens when I do something stupid". Free your mind of these and the rest will follow.

These low-level bit banging errors are vastly less common than shared-memory concurrency issues, which as far as I can tell are endemic to all code that attempts shared-memory concurrency, in any language. If you want to have an axe to grind about languages that aren't future proof, look there.

1 comments

"If you write code in the STL weenie style or the Pretend It's Java style there aren't any idioms I know of that would ever violate the rules he mentions (out-of-range pointers, signed overflow, invalid aliasing)."

What does the STL do about signed overflow? As for out of range pointers, that is an easy one to get with the STL:

  vector<int> somevector(100);
  somevector[200] = 5;
"These are not problems of a language per se"

Yes they are: the default numeric type is fixed-width, pointers pop up all over the place and pointer dereferences are unchecked by default. Personally, though, I would have chosen (as the article's author did) the more severe deficiencies in the standard, like the lack of any requirement that a function with a non-void return type have a return statement along every control path or the fact that there is no reliable way to signal errors that occur in destructors.

"These low-level bit banging errors are vastly less common"

Not in my experience, and not judging by the number of bug reports and vulnerabilities I have seen that stem from low-level mechanics.

I'm not saying the language is some sort of security barrier that prevents any error, I'm saying sanely styled code does not have these issues in practice. The solution is "don't do that, and cultivate habits that will not cause you to do that by accident", not having the compiler make up semantics for broken code or putting in checks everywhere. Just because someone, somewhere does it wrong, doesn't mean it's impossible to do it right.

this:

  vector<int> somevector(100);
  somevector[200] = 5;
Is a C idiom translated by cut-and-paste. The unmotivated poking of arbitrary magic-number offsets into a magic-number sized vector is not proper. It's the kind of thing that sets off alarm bells on even the most casual of review.
"I'm saying sanely styled code does not have these issues in practice"

Otherwise known as the "just do it right" argument. This is an argument that goes all the way back to the days of writing everything in assembly language, and it was just as wrong then as it is today. If only a restricted subset of a language can ensure that basic issues do not become serious problems, then the language should be restricted to that subset.

"not having the compiler make up semantics for broken code or putting in checks everywhere"

Really? I would rather have the compiler put in run time checks whenever it cannot infer that no input will cause the program's behavior to be undefined. Thus, the compiler might insert a check here:

  for(i = 0; i < input.length(); i++)
    some_vector[i]++;
but not here:

  for(i = 0; i < min(input.length(), some_vector.length()); i++)
    some_vector[i]++;
nor here:

  if(input.length() > some_vector.length()) {
    throw some_exception();
  }
  for(i = 0; i < input.length(); i++)
    some_vector[i]++;
  
At the very least, requiring bounds checks on array access would create a definition for out-of-bounds pointers: program termination (or perhaps an exception being thrown). A reasonably good compiler can detect when a bounds check is unnecessary and can remote the bounds check as an optimization. Why shouldn't this be something that compilers do -- out-of-bounds array access is never a good thing (oh, wait, you might be dereferencing some arbitrary pointer that you got by some means other than allocating memory with "new" -- OK, fine, but that is what type systems are for; this sort of separation is not unheard of, I see it in Lisp with SBCL's FFI)?

"The unmotivated poking of arbitrary magic-number offsets into a magic-number sized vector is not proper. It's the kind of thing that sets off alarm bells on even the most casual of review."

Perhaps so, but then the answer is not simply "just use the STL." As with most things C++, it requires a long list of things to make code work right, and even people who have been writing C++ code for many years are sometimes surprised to discover that something they thought was fine is actually bad. C++ makes it pretty easy for programmers to do the wrong thing and needlessly difficult to do the right thing, which is why years of expertise are needed to write remotely reliable C++ code.

I really don't have any difficulty finding programmers who have the discipline to not use the unsafe parts of the language all over the place. C++ has an issue with having a fragmented multitude of sane subsets, but any of them are fine if they get the job done.

That said, I don't understand why you still keep putting up awful mostly-C code as if any trained C++ programmer wouldn't yell at you for doing it wrong, even before they saw the part with the error.

  for(i = 0; i < input.length(); i++)
Where did you learn this? don't do this. Everyone else knows not to do this.

  for(i = 0; i < min(input.length(), some_vector.length()); i++)
This is actually worse, though it does have the virtue of probably working. If you want a run-time check, use at(), or better still use an iterator already.

C++ has all sorts of issues. It's too hard to learn, it's missing some very useful features, and it has a number of rough edges that you have to learn your way around. But the things being complained about in the OP and by you are not real problems for anything but beginners. There just aren't that many naked array accesses or pointer math operations going on in an ordinary C++ application written in non-C style.

Iterators don't protect against iterator invalidation due to e.g. emptying a vector while you iterate over it. Accessing elements through an invalidated iterator is undefined behavior and can lead to exploitable security vulnerabilities.

Even modern C++ has very unsafe parts.

Iterator invalidation is a giant hassle, although it's easier to catch with debug or safety-mode libraries. I kind of got dragged into a derail ranting about the bizarre strawmen above.

My core point is that the OP has a theory about there being a school of C programmers that intentionally or unintentionally invoke undefined behavior and expect the compiler to do the right thing. He's doing a pretty good job of backing it up, although I'm not sure I understand what exactly he's proposing to do about it.

... And then he just kind of throws C++ in for the ride, presumably on the argument that C++ is just like C with even more cases for undefined behavior. But that's not correct because he's making both a technical and a cultural argument. C++ is technologically (mostly) a superset of C, the culture is completely different to the extent that Linus famously argued that the main advantage of using C is that it keeps all the C++ programmers out. http://article.gmane.org/gmane.comp.version-control.git/5791...

Of the widely fragmented C++ user base, there are multiple, popular methods of development that encourage true high-level development were you are encouraged to target your code to the abstract/portable machine that the standard uses and not your personal guess of how the compiler should work and avoids doing things that require inordinate care to get right.

Again, C++ is full of practical problems, the kind of undefined behavior cases the OP worries about don't really rank up there among them.

You're arguing a straw man here. bcoates is saying, and I agree, that the usual examples being given on how horrible C++ is, are not idiomatic C++ and are used only by people who don't have any experience using C++. Of course it's easy to come up with examples of when things might go wrong. C++ is a powerful language, and with great power comes great responsibility, pardon the pompousness of that phrasing. C++ isn't perfect by a long shot, but the reasons brought forth in the OP and most of this discussion are not examples of real problems.
If those things are bad, why does the standard allow them? If any trained C++ programmer would know not to write that sort of code, what purpose does allowing it serve?
Because sometimes for loops are useful. It's no use trying to list up all use cases of for that are safe and/or useful, and prohibit those that aren't. There is a trade off between designing all sorts of safety checks into a language and raw power, and C++ mostly errs on the side of raw power. Which is largely why it's so prevalent and dominant.

I own a pneumatic nail gun that has two rather rudimentary safety features in the form of a trigger lock and a switch near the end of the 'barrel' the prevents it from being activated without the barrel pressing against something (in normal use, the wood you're nailing). It's rudimentary and still accidents happen with nail guns. I sometimes purposely circumvent the safety measures to get something done, e.g. when I'm shooting nails under a weird angle. It would be possible to think of many more safety features - allowing the gun to be operated only when activated with two hands (preventing one from shooting in one's hand), having all sorts of electronics that detect the surface that is being shot into, etc. Not a single one of guns would get sold because they cripple the way you work too much to be convenient.

The more powerful features of C and C++ can be "bad" when misused or abused. Yes, that can happen when in the hands of a beginner.

There are times, however, when an experienced, knowledgeable user does need the power these features provide. Amazing things can be accomplished that couldn't otherwise be done when using a language like Java, C#, Ruby, Python, Perl, Go, Haskell, or Scheme.

>Otherwise known as the "just do it right" argument.

Maybe, maybe not. Can you make a better example than arbitrary magic numbers used as pointers?

Hey, the standard allows it. Why are we excluding code that is allowed by the standard and that many programmers would write if we are not making the "just do it right" argument?