| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by robomartin 4936 days ago

OK, I pulled colors as an example out of thin air but I'll see if I can make it work.

Let's say you have to write a routine that does something based on a color as the input. You have a few choices in terms of how to represent the colors:

    - The name of the color in a string
    - A typdef enum for your colors (integers)
    - A color-per-bit scheme (U8, U16, U32)
    - Channel-per-bit scheme (U8)
    - 4 or 6 bit packed RGB values (U16 or U32)
    - 8 bit packed RGB values (U32)
    - 8 bit unpacked RGB values (struct of three U8)
    - 16 or 32 bit unpacked RGB values (struct of three U16 or U32)
    - Unpacked RGB floats (struct of three floats)
    - and more...

I won't go into the implications of each of the above. Some of it is highly dependent on both the system and the objectives of the work being done.

Say, for example, that you choose to use the names of colors stored in strings as your color representation. Now you have to compare strings in order to identify the colors:

    if(strcmp(input_color, "red") == 0)
    {
      // Do something with red
    }
    else if (strcmp(input_color, "green") == 0)
    {
      // Do something with green
    }
    else if (strcmp(input_color, "blue") == 0)
    {
      // Do something with blue
    }
    ... etc

Regardless of language the strings need to be compared character by character. Even if a language or OO framework allows you to say something like if(string1 == string2) you have to keep in mind that what is going on behind the scenes is pretty much exactly what strcmp() has to do. Which means that the above is, at the very least, slow.

And, of course, it isn't very portable. What happens if the input has to be in German or Japanese?

The typdef enum representation gives you the ability to use a far more efficient construct to identify your colors:

    switch(color)
    {
      case COLOR_RED:
        // Do something with red
        break;
      case COLOR_GREEN:
        // Do something with green
        break;
      case COLOR_BLUE:
        // Do something with blue
        break;
    ... etc

This is much, much faster. It is, at the core, an if/else-if structure that is only comparing integers, which is a single machine language instruction. Fast and clean and language-portable by means of the proper text-to-integer function somewhere to deal with different languages.

If you are on an embedded system that can do bit testing in machine language it might make sense to encode one color per bit or one color channel per bit. for example, in some embedded C dialects you might be able to do something like this:

    if(color.0) // Select and test bit 0
    {
      // This is red
    }
    else if(color.1) // Select and test bit 1
    {
      // This is green
    }
    ... etc

At this level the advantages of doing this are tightly linked to the platform and the goals of the application.

If, for example, one needs to be able to expand the available range of color inputs beyond what can be described with simple words a discrete RGB representation might be the best choice. This is also the case if you wanted to future-proof the program and be ready for when more colors arrive.

Here you have several choices, two of which are to represent each channel with an 8 bit value or choose floats instead.

The 8 bit values can be packed nicely into a U32, making it very efficient. You could also create a struct to facilitate access to the components and let the compiler optimize for you.

The float example is interesting because the conversion from float to whatever (if necessary) can be of any bit width. So, for example, if the color needs to ultimately be mapped to an 8-bit-per-channel display device you can translate from float to 8 bits on output. All of your intermediate math and color manipulation would be done in full-resolution floats which means that you are not going to accumulate errors. This, for example, is important if you are applying FIR filters to calculate missing color sample data from certain video data formats.

Packing has its issues as well. If you are dealing with little-endian vs. big-endian systems there might be overhead associated with unpacking and possibly rearranging a packed RGB value. If you are dealing with processing colors at a massive scale this can have performance and even power consumption implications.

I may have been lucky in that my very first CS professor was hell-bent to teach the importance of thinking deeply about data representation BEFORE thinking about code. He'd repeat this mantra 'till you were sick from hearing it. Years later I'd learn to appreciate this bit of wisdom in more ways than one.

2 comments

kragen 4936 days ago

I think it depends on a lot more of the circumstances than this. For example, in many languages, you can intern the strings so that a string equality test is just a single machine instruction. And if these color representations are crossing some interface that needs to be kept stable, it's a lot easier to add new colors if what's crossing is "red" or "#ff0000" than if it's "2". And it may be that what you're doing with the colors is just generating HTML, rather than doing multi-way switches, in which case the enum implementation has no advantage over the string representation; it just increases code duplication.

The probably more important consideration is that with an enum, your compiler can catch misspellings. Depending on your runtime environment, this can be a huge killer advantage. In particular, if your runtime environment can't do much beyond blink an LED to report errors, compile-time checking is really really important.

link

robomartin 4936 days ago

I think we are slicing this a little too thin. My original point is that it is important to understand that the choices made when representing data can be important. My off-the-hip example was not meant to be definitive.

link

tomasien 4936 days ago

Can I just say that THIS is why I love Hacker News. People are serious about sharing their ideas on a deep level.

Thanks for sharing!

link