| HN Mirror

> Do these actually run faster than the code produced by a good compiler? Benchmarks?

Some of these are absolutely suspect. Now, these are all architecture specific: my results below are for gcc, targeting the Intel chip in the MBP in my lap. (Of course, some of the article's code samples assume certain things about an architecture, and are not portable.)

For example,

> Absolute Value of a Float

> IEEE floating point uses an explicit sign bit, so the absolute value can be taken by a bitwise AND with the complement of the sign bit. For IA32 32-bit, the sign bit is an int value of 0x80000000, for IA32 64-bit, the sign bit is the long long int value 0x8000000000000000. Of course, if you prefer to use just int values, the IA32 64-bit sign bit is 0x80000000 at an int address offset of +1. For example:

  double x;

  /* make x = abs(x) */
  *(((int *) &x) + 1) &= 0x7fffffff;

First, this is undefined behavior. The portable way to do this is fabs(), which is part of the standard library. For me, calling fabs() compiles to a single instruction:

  andps   LCPI1_0(%rip), %xmm0

where LCPI1_0 is

  .quad   9223372036854775807     ## 0x7fffffffffffffff
  .quad   9223372036854775807     ## 0x7fffffffffffffff

fabs() compiled down to essentially the same bitwise &, in assembly, that the article is doing. The article's "advice" also gets compiled to exactly the same instruction & set of constants, which I actually find way more interesting.

> Integer Constant Multiply

> Given an integer value x and an integer or floating point value y, the value of xy can be computed efficiently using a sequence derived from the binary value of x. For example, if x is 5 (4 + 1):*

  y2 = y + y;
  y4 = y2 + y2;
  result = y + y4;

> In the special case that y is an integer, this can be done with shifts:

  y4 = (y << 2);
  result = y + y4;

Compilers will do this for you, today, automatically. In the example case of multiplying by 5, gcc will do better than the article's code, by using lea cleverly:

  leal    (%rdi,%rdi,4), %eax

gcc also compiles the article's shift-then-add strategy to exactly the same leal.

The article mentions this optimization again, later,

> Shift-and-Add Optimization

> Rather obviously, if an integer multiply can be implemented by a shift-and-add sequence, then a shift-and-add sequence can be implemented by multiplying by the appropriate constant... with some speedup on processors like the AMD Athlon. Unfortunately, GCC seems to believe constant multiplies should always be converted into shift-and-add sequences, so there is a problem in using this optimization in C source code.

The above leal instruction was the output of GCC, so this is obviously wrong. One can maybe say the article is old, but I'm pretty sure GCC has had the "use leal to do constant multiplies" optimization for over a decade now, but then, perhaps the article is that old. If so, the advice would appear to no longer apply.

Write it the obvious, readable way first. If you decide to optimize at the level this article is discussing, you should be reading the assembly to see if your optimizations will have any effect.

But not all of them are bad. The "is this unsigned integer a power of 2" code is correct, I believe, and if you understand binary, not that hard to understand.