Hacker News new | ask | show | jobs
by haberman 4803 days ago
There are some subtle problems with the model as explained in this article. If you use this as your mental model, you will probably run afoul of undefined behavior without realizing it.

If you read the C standard, you'll notice it doesn't talk much about "memory" (the word only appears 13 times in C99); it mostly talks about "objects" (mentioned 735 times in C99). These objects aren't OO-objects -- obviously C doesn't have OOP built in -- but rather all the basic types like int, float, struct, etc are objects. When you declare a variable like "int x", you are creating an object.

C's aliasing rules dictate that you can only access an object via a pointer of that object's actual type. This is why it is dangerous to think of the assignment operator as a simple memory-copying operation. If assignment were a simple memcpy, you could do something like this:

  int x = 5;
  // BAD: undefined behavior, violates aliasing.
  short y = *(short*)&x;
If a variable were just a memory address and assignment were just a memory copy, this would be a valid operation. But the right way to think of it is that a variable is a storage object whose address can be taken, and and a dereference is an operation that reads a storage object.

A pointer isn't a generic memory-reading facility, it must actually point to a valid storage object of the pointer's type (or to NULL).

If you do want to read and write arbitrary objects in memory, you can always use memcpy():

  int x = 5;
  short y;
  // This is fine, and smart C compilers optimize away the
  // function call.
  memcpy(&y, &x, sizeof(y));
1 comments

If a variable were just a memory address and assignment were just a memory copy, this would be a valid operation.

It's a valid operation regardless of whether a standards body says it's not.

  uint32 x = 5;
  uint16 y = *(uint16*)&x;
The effect is to set y to the first two bytes of memory from x. Values assigned to x are serialized into memory in either big endian or little endian order. Those are the only two cases you have to account for. Quake 3 engine has a macro for the above operation which produces the same value of y on all platforms. This is useful for serializing x to disk, then loading it later (and possibly on a different architecture).

One source of confusion is that int and short are essentially, for all intents and purposes, undefined -- they are of course defined by the standards, but their implementation is allowed to vary so much that no programmer can make any assumptions about their size (in bytes) at runtime.

int8, int16, int32, int64 are all explicit and force the compiler (and the hardware) to obey the wishes of the programmer. This is, I think, the right approach. People make much ado about the fact that "a byte isn't necessarily 8 bits" and "the only assumption you can make about a short is that it's smaller than an int, and larger than a char", etc, which is probably unnecessary mental effort.

"Bytes are 8 bits. Here are four bytes. Here's the value that the four bytes store. Copy two of the four bytes to this other spot (adjusting for endianness appropriately via a macro)."

You typically don't want a memcpy in situations like this due to endianness.

The reason it's useful to explicitly "break the rules" like this is because it's important to know what assumptions you in fact can rely on, regardless of what standards bodies have to say about it. Because at that point you can do incredible things such as http://www.codercorner.com/RadixSortRevisited.htm

   inline float fabs(float x){
        return (float&) ((unsigned int&)x)&0x7fffffff ;
   }
The reason this is incredible and awesome (rather than horrible and dangerous) is because it enabled game developers to achieve a more impressive product for end users, because they were able to do more with the CPU resources that were available at the time.

It's of course not so relevant nowadays, since it's reasonable to assume that most gamers have at least a core 2 duo. But it's one of those things that isn't relevant until suddenly it is -- you're in some situation that requires sorting millions of floats, and your dataset simply demands more performance than your compiler typically gives you. Then suddenly you find you can do amazing things like this, and surprise people with how effectively you can use a modern CPU.

(Although, the modern antidote to "I need to sort millions of floats quickly" is to use SSE, not to sort floats as integers. Yet that's even more evidence that it's better to understand the capabilities of the hardware.)

> It's a valid operation regardless of whether a standards body says it's not.

Whoa there, cowboy. You may not feel personally beholden to standards bodies, but compiler vendors are following their lead. The major compilers are getting more and more aggressive about optimizing away undefined behavior every year.

> The effect is to set y to the first two bytes of memory from x.

No, it's really not. It's undefined behavior and the compiler is free to do absolutely whatever it wants.

> One source of confusion is that int and short are essentially, for all intents and purposes, undefined -- they are of course defined by the standards, but their implementation is allowed to vary so much that no programmer can make any assumptions about their size (in bytes) at runtime.

I agree with this, and have made this argument before: http://blog.reverberate.org/2013/03/cc-gripe-1-integer-types...

But this is an entirely separate issue.

No, it's really not. It's undefined behavior and the compiler is free to do absolutely whatever it wants.

The point is that compilers do some specific thing, regardless of the fact that the standards bodies say they're free to reboot your computer.

As long as all you care about is x86/x86_64/PowerPC (and probably ARM as well), then you can trust that the compiler is going to generate code which copies the first two bytes of x into the memory occupied by y.

>As long as all you care about is x86/x86_64/PowerPC (and probably ARM as well), then you can trust that the compiler is going to generate code which copies the first two bytes of x into the memory occupied by y.

That's the thing that haberman is trying to tell you, you can't trust that any more, even with architectures you think you know. What you said was true about 10 years ago, but things have changed. Go read about "-fno-strict-aliasing" [1].

[1] http://thiemonagel.de/2010/01/no-strict-aliasing/

There be dragons. The following program prints 10 on gcc 4.6.3, x86-64:

  #include <stdio.h>
  #include <stdint.h>

  void f(uint32_t *x, uint16_t *y) {
    *x = 5;
    printf("%d\n", *y);
  }

  int main() {
    uint32_t x = 10;
    f(&x, (uint16_t*)&x);
  }
The antidote is to put a memory barrier in between the assignment and the printf.

  *x = 5;
  __sync_synchronize();
  printf("%d\n", *y);
http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins....

The reason this example is fundamentally different from my example is because mine doesn't create two objects that point to the same memory. In such situations, memory barriers are necessary. Also, your program won't work on different platforms due to endianness.

That is not what memory barriers are for, at all. Memory barriers are a sequencing primitive for shared-memory concurrency (an excellent intro is here: http://lxr.linux.no/linux/Documentation/memory-barriers.txt). They are never required for correctness in valid single-threaded programs.

The memory barrier "fixed" this program similarly to how a cruise missile "fixes" a termite problem. It was just a coincidence and it was the wrong tool for the job.

The reason it's useful to explicitly "break the rules" like this is because it's important to know what assumptions you can in fact rely on, regardless of what standards bodies have to say about it.

Given that compilers do break when programmers violate aliasing rules, you should recheck what assumptions you think you can rely on. Non-strict aliasing is not one of them. Unless you want to slow everything down with compiler-specific flags like -fno-strict-aliasing.

    uint8_t foo[4]; *(uint32_t*)foo = 0;
Besides even without strict aliasing, the above is not at all guaranteed to work since not all architectures support unaligned loads. (and if you think "well but no one uses them, just like no one uses 1's complement architectures anymore", keep in mind that this includes ARM)

(also use stdint types already)

  uint8_t foo[4]; *(uint32_t*)foo = 0;
Besides even without strict aliasing, the above is not at all guaranteed to work since not all architectures support unaligned loads.

So, the interesting thing about this example is that it does work. It's in fact very, very difficult to find a platform where that example won't work (i.e. crashes the program). For example, any C library involving image manipulation is likely going to have code similar to what you've described, and those libraries work on almost every platform.

Standards are a good and useful thing. All I'm saying is that it's important to know which rules you can safely violate.

> It's in fact very, very difficult to find a platform where that example won't work

No, it isn't. Many ARM processors will bus error on that code if (foo & 3) != 0. I believe PowerPC doesn't do unaligned word reads either...

It quite often has to do with the memory controller and not with the particular processor, though I believe x86 has to support unaligned reads. I've certainly worked first hand with ARMs that did not support it.

That's interesting. What causes the bus error?

Would

  uint8_t foo[4];  *(uint32_t*)(&foo[0]) = 0;
also result in a bus error? Why?
That's the same thing, so yes, if foo is unaligned then it will cause a bus error. It causes it because the code is generate a store word assembly instruction (as opposed to store byte) and if the address is not aligned to 4 bytes then the memory controller hardware will raise a bus error.

Notice I keep saying "if the address is unaligned". The insidious part is that it probably will work for a while since it's likely that your "foo" array will happen to be aligned. But add one uint8_t variable to your structure or stack frame or wherever "foo" is defined and things could shift and suddenly it starts causing bus errors. It can be a very annoying type of heisenbug.

And bus errors are actually a good thing. I believe I've used hardware (an ARM or an SH2, can't remember) where the memory controller just ignored the last 2 bits during whole word reads and writes (which works fine as long as you only read aligned words). So if run your code on that hardware it doesn't give you an error, it just subtly "corrupts" your data. Yay!

any C library involving image manipulation is likely going to have code similar to what you've described

...which actually is exactly how I found out first-hand that it doesn't always work. If you only ever test on x86 you'll never catch it. You might not even catch it on ARM if you're lucky.

Which is the point - that compilers can and do make use of almost all undefined behavior of C for optimizations, which one developer might not catch because their current compiler happened to work. Then a new version is released that can find and exploit more undefined behavior. And strict aliasing is one of those rules you can't safely violate.

>int8, int16, int32, int64 are all explicit and force the compiler (and the hardware) to obey the wishes of the programmer.

At least in C99, the compiler doesn't need to support exact-width integer types.

>People make much ado about the fact that "a byte isn't necessarily 8 bits"

Well, POSIX.1-2004 requires that CHAR_BIT == 8.

> It's a valid operation regardless of whether a standards body says it's not.

All the world's a VAX, sure. Don't mind the next generation of hardware coming down the pike and the next wave of compiler optimizations.

http://catb.org/jargon/html/V/vaxocentrism.html