Hacker News new | ask | show | jobs
by hectorhector 2690 days ago
What's the purpose of this code in the original decompressor? Assuming c is an uchar, aren't EOF and 0xFF equal?

  if (c == 0xFF) {
    len = getchar();
    c = getchar();
    while (len--)
    emit(c);
  }
3 comments

c should not be a char or unsigned char, because the return type of getchar() is "int". If you put it into a char-width variable then you lose the distinction between EOF (which is -1) and the byte 0xff. Getting the type of 'c' wrong is quite a common bug, because the API makes it an easy mistake to make. In this case, if you look down at the eventual transformed code in the "Evaluation" section you'll see that 'c' is indeed correctly declared with 'int' type.
getchar() returns an int to accommodate 257 different values: all byte chars + EOF (typically -1).

The code snippet itself is run length encoding with 0xff as an escape.

Decompression, the first byte is the number of repeats and the second is the byte to repeat.
Ah, thanks!
Specifically this is known as "Run Length Encoding".