Hacker News new | ask | show | jobs
by potiuper 1540 days ago
https://github.com/ifduyue/musl/blob/master/src/math/__cos.c
3 comments

For anyone else confused by this, the control logic described in the comment happens in https://github.com/ifduyue/musl/blob/master/src/math/cos.c
> Input y is the tail of x.

What does "tail" mean in this context?

It is a double-double representation [1], where a logical fp number is represented with a sum of two machine fp numbers x (head, larger) and y (tail, smaller). This effectively doubles the mantissa. In the context of musl this representation is produced from the range reduction process [2].

[1] https://en.wikipedia.org/wiki/Quadruple-precision_floating-p...

[2] https://github.com/ifduyue/musl/blob/6d8a515796270eb6cec8a27...

Does it make sense to use a double-double input when you only have double output? Sine is Lispchitz-limited by 1 so I don't see how this makes a meaningful difference.
The input might be double but the constant pi is not. Let f64(x) be a function from any real number to double, so that an ordinary expression `a + b` actually computes f64(a + b) and so on. Then in general f64(sin(x)) may differ from f64(sin(f64(x mod 2pi))); since you can't directly compute f64(sin(x mod 2pi)), you necessarily need more precision during argument reduction so that f64(sin(x)) = f64(sin(f64timeswhatever(x mod 2pi))).
But am I correct in thinking that is at worst a 0.5 ulp error in this case? The lesser term in double-double can't be more than 0.5 ulp of the greater term and sensitivity of both sine and cosine to an error in the input will not be more than 1.

Also, in case of confusion, I was specifically commenting on the function over the [-pi/4, pi/4] domain in https://github.com/ifduyue/musl/blob/master/src/math/__cos.c , which the comment in https://news.ycombinator.com/item?id=30846546 was presumably about.

Yeah, sine and cosine are not as sensitive (but note that many libms target 1 or 1.5 ulp error for them, so a 0.5 ulp error might still be significant). For tangent however you definitely need more accurate range reduction.
Double rounding can still bite you. You are forced to incur up to half an ulp of error from your polynomial, so taking another half ulp in your reduction can lead to a total error of about 1 ulp.
I think the polynomial calculation in the end looks interesting. It doesn't use Horner's rule.
It does use Horner's rule, but splits the expression into two halves in order to exploit instruction-level parallelism.
Considering the form of both halves is the same, are compilers smart enough to vectorize this code?
I might be wrong but I would think for something like this vectorizing wouldn't save time (since you would have to move data around before and afterwards. The real benefit of this is it lets you run two fma operations in parallel.