Hacker News new | ask | show | jobs
by Agentlien 1586 days ago
I just want to second this with my own experience just now:

I looked at the title while still waking up.

At first I thought of the low + (high - low) / 2 method. I then figured maybe it was better to simply predivide both numbers before adding and just correcting for the lowest bit (how was that ever patented?!).

However, I didn't like having to perform two divisions so I thought there was probably something clever one could do with bit operations to avoid it. But, still being tired, I decided I didn't want to actually spend time thinking on the problem and I'd already spent a minute on it.

1 comments

x / 2 === x >> 1, it's fast.
For unsigned or positive x. Yes, the article is about unsigned integers, but some might see this for the first time and not be aware of this restriction. -3 / 2 == -1 but -3 >> 1 == -2.
Interesting thing is that `(a >> 1) + (b >> 1) + (a & b & 1)` works correctly for signed integers (if `>>` works like in Java, filling most significant bit with ones for negative numbers). With division you'll need to write different expressions depending on operand signs. E.g. (-3) / 2 + (-5) / 2 + ((-3) & (-5) & 1) = (-1) + (-2) + 1 = -2. But ((-3) >> 1) + ((-5) >> 1) + ((-3) & (-5) & 1) = (-2) + (-3) + 1 = -4.
You can use the appropriate right shift with signed integers as easy as with unsigned integers, you just have to handle in the right way the correction due to the bit shifted out.

The fact that the right shift for a negative integer gives the floor function of the result just makes the correction easier than if you had used division with truncation towards zero.

The shifted out bit is always positive, regardless whether the shift had been applied to negative or positive numbers.

Except for following a tradition generated by a random initial choice, programming would have been in many cases easier if the convention for the division of signed numbers would have been to always generate positive remainders, instead of generating remainders with the same sign as the quotient.

With positive remainders you get wired quotient behavior. Why should 10/3 and -10/-3 yield different results? Besides that, the choice is not universal, different languages use different conventions.
Why should 10/3 and -10/-3 yield the same result?

I do not see where this would be of any use.

On the other hand, if you want a quotient that has some meaningful relationship with the ratio between the dividend and the divisor, there are other more sensible definitions of the integer division than the one used in modern programming languages.

You can have either a result that is a floating point number even for integer dividend and divisor, like in Algol, or you can define the division to yield the quotient rounded to even (i.e. with a remainder that does not exceed half of the divisor).

In both cases 10/3 and -10/-3 would yield the same result and I can imagine cases when that would be useful.

For the current definition of the integer division, I do not care whether 10/3 and -10/-3 yield the same result. It does not simplify any algorithm that I am aware of, while having a remainder of a known sign simplifies some problems by eliminating some tests for sign.

I was not really thinking about application but the mathematics. It seems a reasonable decision to me that |a / b| = |a| / |b| and to not get results of different magnitude depending on sign changes only.
It's fast, but I figured doing that on both sides before adding looked a bit inelegant and maybe it could be avoided by doing "something something bit operations" and then I dropped the thought and clicked the link.
On a modern architecture given that most integers are usually u32 by default but the underlying CPU deals with 64bits natively, I'd just cast to u64 and call it a day.

Actually I was curious to see if GCC would be smart enough to automatically choose what's the best optimization depending on the underlying architecture, but it doesn't appear to be the case.

For x86_64 (with -O3 or -Os):

    avg_64bits:
    .LFB0:
        .cfi_startproc
        movl    %edi, %edi
        movl    %esi, %esi
        leaq    (%rdi,%rsi), %rax
        shrq    %rax
        ret
        .cfi_endproc

    avg_patented_do_not_steal:
   .LFB1:
        .cfi_startproc
        movl    %edi, %eax
        movl    %esi, %edx
        andl    %esi, %edi
        shrl    %eax
        shrl    %edx
        andl    $1, %edi
        addl    %edx, %eax
        addl    %edi, %eax
        ret
Clearly just casting to 64bits seems to denser code

For ARM32 (-O3 and -Os):

    avg_64bits:
        push    {fp, lr}
        movs    r3, #0
        adds    fp, r1, r0
        adc     ip, r3, #0
        mov     r0, fp
        mov     r1, ip
        movs    r1, r1, lsr #1
        mov     r0, r0, rrx
        pop     {fp, pc}

    avg_patented_do_not_steal:
        and     r3, r1, #1
        ands    r3, r3, r0
        add     r0, r3, r0, lsr #1
        add     r0, r0, r1, lsr #1
        bx      lr
A lot more register spilling in the 64bit version since it decides to do a true 64bit add using two registers and an adc.

My code, for reference:

    uint32_t avg_64bits(uint32_t a, uint32_t b) {
      uint64_t la = a;
      uint64_t lb = b;
    
      return (la + lb) / 2;
    }

    uint32_t avg_patented_do_not_steal(uint32_t a, uint32_t b) {
        return (a / 2) + (b / 2) + (a & b & 1);
    }