It is a double-double representation [1], where a logical fp number is represented with a sum of two machine fp numbers x (head, larger) and y (tail, smaller). This effectively doubles the mantissa. In the context of musl this representation is produced from the range reduction process [2].
Does it make sense to use a double-double input when you only have double output? Sine is Lispchitz-limited by 1 so I don't see how this makes a meaningful difference.
The input might be double but the constant pi is not. Let f64(x) be a function from any real number to double, so that an ordinary expression `a + b` actually computes f64(a + b) and so on. Then in general f64(sin(x)) may differ from f64(sin(f64(x mod 2pi))); since you can't directly compute f64(sin(x mod 2pi)), you necessarily need more precision during argument reduction so that f64(sin(x)) = f64(sin(f64timeswhatever(x mod 2pi))).
But am I correct in thinking that is at worst a 0.5 ulp error in this case? The lesser term in double-double can't be more than 0.5 ulp of the greater term and sensitivity of both sine and cosine to an error in the input will not be more than 1.
Yeah, sine and cosine are not as sensitive (but note that many libms target 1 or 1.5 ulp error for them, so a 0.5 ulp error might still be significant). For tangent however you definitely need more accurate range reduction.
Double rounding can still bite you. You are forced to incur up to half an ulp of error from your polynomial, so taking another half ulp in your reduction can lead to a total error of about 1 ulp.
I might be wrong but I would think for something like this vectorizing wouldn't save time (since you would have to move data around before and afterwards. The real benefit of this is it lets you run two fma operations in parallel.