| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mgaunard 1206 days ago
	The compiler is already reducing integer division by a constant into these things. Those algorithms become more important when you're dividing by a value known at runtime but which remains the same during parts of the program. That's where libdivide comes in.

2 comments

mncharity 1206 days ago

libdivide tl;dr:

> libdivide allows you to replace expensive integer divides with comparatively cheap multiplication and bitshifts. Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime. The result is that integer division can become faster - a lot faster. [...] divide SIMD vectors by runtime constants, [1]

> libdivide.h is a header-only C/C++ library for optimizing integer division. Integer division is one of the slowest instructions on most CPUs e.g. on current x64 CPUs a 64-bit integer division has a latency of up to 90 clock cycles whereas a multiplication has a latency of only 3 clock cycles. libdivide allows you to replace expensive integer divsion instructions by a sequence of shift, add and multiply instructions that will calculate the integer division much faster.

> On current CPUs you can get a speedup of up to 10x for 64-bit integer division and a speedup of up to to 5x for 32-bit integer division when using libdivide. libdivide also supports SSE2, AVX2 and AVX512 vector division which provides an even larger speedup. You can test how much speedup you can achieve on your CPU using the benchmark program.[2]

[1] https://libdivide.com/ [2] https://github.com/ridiculousfish/libdivide

link

int_19h 1206 days ago

The obvious question this invites: if this is generally faster for unknown values, why don't compilers use this optimization directly in emitted code?

link

moonchild 1206 days ago

They could, but it's less profitable, and cost models are, as always, a big problem. If we have:

  loop for i from 0 to n
      f(a[i]/y)

How big does n have to be before it makes sense to compute a reciprocal for y? And how frequently is it actually that big during the runtime of the code?

link

_a_a_a_ 1206 days ago

> Compilers usually do this, but only when the divisor is known at compile time

if the divisor is in a variable or otherwise 'hidden', the compiler can't deduce enought to get to it, is my guess.

link

baruch 1206 days ago

Calculating the divisor values is also expensive, this works when you do this work once and then do the efficient divides multiple times.

link

eru 1206 days ago

See https://godbolt.org/z/3Yqbceaza for what godbolt says clang produces.

Keep in mind that this doesn't use the fact that we know that the input is between 0 to 63.

link

workaccount1 1206 days ago

> Keep in mind that this doesn't use the fact that we know that the input is between 0 to 63.

You can use __builtin_assume for this: https://godbolt.org/z/K4jKhxnTq

link

pantalaimon 1206 days ago

An assert() also does the trick: https://godbolt.org/z/MecvMGPdW

edit uh but when asserts are disabled it won't work: https://godbolt.org/z/4TMs1Wc5z

unless you roll your own assert: https://godbolt.org/z/4v35rrTvn

link

RenThraysk 1206 days ago

9*x/64 still reduces to 2 instructions

https://godbolt.org/z/6WsWqh4ah

link