Actually pretty much everyone implements double precision sin/cos using the same (IIRC) pair of 6th order polynomials. The same SunPro code exists unchnaged in essentially every C library everywehre. It's just a fitted curve, no fancy series definition beyond what appears in the output coefficients. One for the "mostly linear" segment where the line crosses the origin and another for the "mostly parabolic" peak of the curve.
yeah, but 52 adds can be a lot cheaper than a few multiplies, if you're making them out of shift registers and logic gates (or LUT). in a CPU or GPU, who cares, moving around the data is 100x more expensive than the ALU operation.
> in a CPU or GPU, who cares, moving around the data is 100x more expensive than the ALU operation
Moving data is indeed expensive, but there’s another reason to not care. Modern CPUs take same time to add or multiply floats.
For example, the computer I’m using, with AMD Zen3 CPU cores, takes 3 cycles to add or multiply numbers, which applies to both 32- and 64-bit flavors of floats. See addps, mulps, addpd, mulpd SSE instructions in that table: https://www.uops.info/table.html