Hacker News new | ask | show | jobs
by lntue 1114 days ago
So in the implementation of cos_table_*_LERP, you did technically 2 step range reduction:

1. Reduce x = x mod 2*pi

2. Reduce index = floor(x / 10^-n), and i - index = 10^n * (x mod 10^-n)

With limited input range and required precision as in the tests, you can combine these 2 range reduction steps:

1. Choose the reduced range as power of 2 instead of power of 10 for cheaper modulus operation, let say 2^-N = 2^-7.

2. Avoid the division in modd(x, CONST_2PI) by multiplying by 2^N / pi.

3. Avoid the round trip double -> int -> double by using the floor function / instruction.

Here is the updated version of cos_table_*_LERP which should have higher throughput and lower latency:

  double cos_table_128_LERP(double x) {
    x = fabs(x);
    double prod = x * TWO_TO_SEVEN_OVER_PI;
    double id = floor(prod);
    double x = prod - id;  /* after this step, 0 <= x < 1 */
    int i = ((int)id) & 0xFF; /* i = id mod 2^8 */
    return lerp(x, COS_TABLE[i], COS_TABLE[i + 1]);
  }

You can also optimize lerp a bit more with the formula:

  lerp(w, v1, v2) = (1 - w) * v1 + w * v2 = w * (v2 - v1) + v1
We do employ this range reduction strategy in a more accurate way for trig functions in LLVM libc:

  - with FMA instructions: https://github.com/llvm/llvm-project/blob/main/libc/src/math/generic/range_reduction_fma.h

  - without FMA instructions: https://github.com/llvm/llvm-project/blob/main/libc/src/math/generic/range_reduction.h