| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sillysaurus 4803 days ago

If a variable were just a memory address and assignment were just a memory copy, this would be a valid operation.

It's a valid operation regardless of whether a standards body says it's not.

  uint32 x = 5;
  uint16 y = *(uint16*)&x;

The effect is to set y to the first two bytes of memory from x. Values assigned to x are serialized into memory in either big endian or little endian order. Those are the only two cases you have to account for. Quake 3 engine has a macro for the above operation which produces the same value of y on all platforms. This is useful for serializing x to disk, then loading it later (and possibly on a different architecture).

One source of confusion is that int and short are essentially, for all intents and purposes, undefined -- they are of course defined by the standards, but their implementation is allowed to vary so much that no programmer can make any assumptions about their size (in bytes) at runtime.

int8, int16, int32, int64 are all explicit and force the compiler (and the hardware) to obey the wishes of the programmer. This is, I think, the right approach. People make much ado about the fact that "a byte isn't necessarily 8 bits" and "the only assumption you can make about a short is that it's smaller than an int, and larger than a char", etc, which is probably unnecessary mental effort.

"Bytes are 8 bits. Here are four bytes. Here's the value that the four bytes store. Copy two of the four bytes to this other spot (adjusting for endianness appropriately via a macro)."

You typically don't want a memcpy in situations like this due to endianness.

The reason it's useful to explicitly "break the rules" like this is because it's important to know what assumptions you in fact can rely on, regardless of what standards bodies have to say about it. Because at that point you can do incredible things such as http://www.codercorner.com/RadixSortRevisited.htm

   inline float fabs(float x){
        return (float&) ((unsigned int&)x)&0x7fffffff ;
   }

The reason this is incredible and awesome (rather than horrible and dangerous) is because it enabled game developers to achieve a more impressive product for end users, because they were able to do more with the CPU resources that were available at the time.

It's of course not so relevant nowadays, since it's reasonable to assume that most gamers have at least a core 2 duo. But it's one of those things that isn't relevant until suddenly it is -- you're in some situation that requires sorting millions of floats, and your dataset simply demands more performance than your compiler typically gives you. Then suddenly you find you can do amazing things like this, and surprise people with how effectively you can use a modern CPU.

(Although, the modern antidote to "I need to sort millions of floats quickly" is to use SSE, not to sort floats as integers. Yet that's even more evidence that it's better to understand the capabilities of the hardware.)

4 comments

haberman 4803 days ago

> It's a valid operation regardless of whether a standards body says it's not.

Whoa there, cowboy. You may not feel personally beholden to standards bodies, but compiler vendors are following their lead. The major compilers are getting more and more aggressive about optimizing away undefined behavior every year.

> The effect is to set y to the first two bytes of memory from x.

No, it's really not. It's undefined behavior and the compiler is free to do absolutely whatever it wants.

> One source of confusion is that int and short are essentially, for all intents and purposes, undefined -- they are of course defined by the standards, but their implementation is allowed to vary so much that no programmer can make any assumptions about their size (in bytes) at runtime.

I agree with this, and have made this argument before: http://blog.reverberate.org/2013/03/cc-gripe-1-integer-types...

But this is an entirely separate issue.