|
|
|
|
|
by qooiii2
3318 days ago
|
|
You can do a little better with a minimax polynomial that you can derive with the Remez algorithm. If you have Matlab, you can use the excellent Chebfun package's remez command to do this almost instantly for any well-behaved function. The best approximation I could find calculated sin(x) and cos(x) for [-pi/4,pi/4] and then combined these to cover all other inputs. These functions need only 4 or 5 terms to get +/-1 ULP accuracy for 32-bit floats. I thought the harder problem was the argument reduction to enable calculating something like sin(10^9). It turns out that you need a lot of bits of pi to do this accurately. While testing my implementation, I was surprised to learn that the x87 fsin implementation only has 66 bits of pi, so it gives extremely wrong answers for large inputs. |
|
Yup! This was something I meant to mention in a footnote. If I could get equivalent [infinite precision] error bounds using a minimax polynomial of a lower degree, I think this would make a difference. Otherwise, even if the theoretical error bounds are lower for a minimax polynomial of the same degree, it wouldn't do anything to combat the noise introduced by rounding errors, assuming you keep everything else the same - and that noise seems to be by far the limiting factor. But that's just conjecture - no way to know for certain without trying it.
I do still like the Chebyshev polynomial approach because it's the only non-iterative approach I know of. It's something that I could implement without the aid of a CAS program pretty trivially, if I wanted to.
> The best approximation I could find calculated sin(x) and cos(x) for [-pi/4,pi/4] and then combined these to cover all other inputs.
Yes, that also seems to be the way to go (at least for precision). I didn't mention it in the article, but even though the code I showed was rust, the actual implementation is in the form of an AST for a plugin system that only supports f32 addition, multiplication, division, and modulo as operations (no branching, f32<->int conversions, etc). My entire motivation was to create a sine function for this system. It's known that the inputs to the function will be in (-pi, pi), so I don't have to perform reduction over any larger range. I don't see a practical way to reduce down to (-pi/4, pi/4) using the operations available to me, but I could be overlooking something.