Hacker News new | ask | show | jobs
by pkaler 4893 days ago
> His suggestion #3, that the standards should define more of the commonly used behavior and leave less of it undefined, wouldn't even require C programmers to do anything about it themselves.

I've written Windows, Mac, Linux, Xbox, PlayStation, PSP, iOS, and Android code. The memory model is subtly different for each platform. I just don't think you can define certain behaviour and have that work across disparate platforms.

I haven't really written any device drivers or kernel space code but I would imagine it would make the job even more difficult.

4 comments

You underestimate how much undefined behaviour is in typical C programs and how little of it is yet taken advantage of by compilers.

The compilers are now starting to fairly radically rewrite the original code in ways the author would not recognize, simply because of some undefined behaviour exists within the code. You need to be increasingly language lawyerly to avoid the compiler outsmarting you, almost as if it was a hostile opponent.

The read of an uninitialized variable in the article was a good example.

The problem is that programmers have a mental model of how the C they write turns into machine code, and that model is increasingly out of date in the search for more performance. The compiler is becoming less predictable, in precisely the way that we argue against "sufficiently smart compilers" in the past for languages at a higher level than C - that you wouldn't be able to predict when the smart compiler was smart enough to optimize your high-level construct. Now you're increasingly unable to predict what the compiler will turn your code into, unless you have a deeper understanding of the rules.

The "hostile opponent" analogy is a good one. C was always intended as a kind of higher-level replacement for assembly, so it was reasonable to assume (for instance) that uninitialized integer variables contain some unspecified but definite value, but recently compilers have been deliberately breaking those assumptions just because they can. It's almost reached the point where C isn't useful for its original purpose of systems programming; it's very hard to write threaded code that doesn't rely on undefined behaviour, for instance.
Ostensibly, a platform like Java or Rust is supposed to abstract stuff like the memory model. I haven't written a lot of Java code, especially not Java code that runs on many different native system / VMs, but from my perspective of blissful ignorance, it seems to have done the job?

Same with other high-level VM based languages like Python...

For most programs, yes they have done their job. However, in certaint categories of applications (for example, server software) it's somewhat of a leaky abstraction. Garbage collector sweeps, circular references, etc are all pains which force you to be aware of how the vm is managing your memory.

For an impression, see this excellent blog series on how a certain garbage collector sweep issue was solved: http://blog.cloudera.com/blog/2011/02/avoiding-full-gcs-in-h...

It works, but to get C to expose the same memory model on different platforms you would have to compromise the performance and close-to-the-hardware nature that are the only reason to use C nowadays.
And of course, there's not a lot of warranty your JVM/Python/Ruby/Other VM doesn't suffer from any of the quoted C/C++ issues
Python is not future proof.

There are undefined sequences even in Python, where Jython and CPython output different programs.

Small amounts of undefined behaviour are normal in most language specs though to give implementations flexibility. Tests to make sure you do not rely on them would be useful though.
Not to mention the GIL...
The guy behind "Embedded in Academia" knows quite a bit about the memory models supported by C, and had done some marvelous work regarding the testing of C compilers, C code and undefined behavior. If he claims it is possible to improve the situation and leave less behavior undefined, he's most probably right.
You certainly could define some basic things to make the language safer. For example, make variables always be initialized to zero if not explicitly initialized, and force accessing beyond the bounds of an array to be a fault rather than undefined behavior.
You could, but that comes at a cost. That's why libraries like the STL in C++ provide std::vector::operator[] and std::vector::at() - so the user can freely choose whether to pay the extra cost for the bounds check, or not. That's why C provides both malloc() and calloc() - so the user can freely choose whether memory is zero-initialized, or not.

One of the major design decisions for C/C++ is that you don't pay for what you don't use. This is what makes them so flexible and performant across a wide range of systems and applications, but also leaves these safety choices up to the user. Some languages make that tradeoff, but it's not always the right decision.

On the other hand there are languages where correctness comes before speed, and they still provide you the mechanisms to get speed if you really want.

For example, in the Pascal family of languages, you can always disable bounds checking or do pointer arithmetic if you really want to, but that should only be done if there is really the need to do so.

A problem with many C and C++ developers is that they suffer from premature optimization, thinking that we are still targeting PDP-11 like environments.

Initializing variables to zero doesn't buy you much in terms of safety, IMHO. The value 0 isn't necessarily any more valid than an arbitrary value. Better is Java/ML/Haskell's rule whereby variables must be explicitly initialized before use. This can be implemented with a simple compiler pass.
At least the value 0 is always the same and doesn't subtly change from one invocation to the next or from one machine to the next. It certainly helps in making programs more robust, even if there is still a problem at code level.
Java? Everything (except for built-in types) is nullable in Java...
pcwalton's point is that you must be explicit about initializing variables:

  int foo = 1;
  int bar;
  System.out.println(foo + bar);  // compile error: variable bar might not have been initialized