Hacker News new | ask | show | jobs
by stephencanon 1151 days ago
> for that there's still no better approach than using the Q_rsqrt idea

Or you can just take a square root and divide; these are (partially-)pipelined operations on modern CPUs, with _much_ shorter latency than they had when "fast inverse square root" was a thing.

It still has a niche, but that niche is very, very narrow today.

2 comments

If you're doing ~20 other FP ops (not including division) per inverse square root and not using a huge dependency chain of inverse square roots, you might not even notice the impact of using an exact inverse square root with this method.

Short of functions like exp and the trig functions, DIY approximations are usually not worth it these days. However, FP division getting faster has made these functions faster, too, since they can now use rational approximations rather than pure polynomials.

polynomials are still king. a 6th degree polynomial is often enough and will be faster than a 2nd degree rational.
A 6th degree polynomial is firmly in the "fast approximation" category. I think sin/cos are ~15th degree polynomials for 1 ULP of precision.
sin/cos commonly use 2 different 5th degree polynomials (depending on quadrant of the sin graph) https://github.com/JuliaLang/julia/blob/bb83df1d61fb649efd15...
https://github.com/rutgers-apl/The-RLIBM-Project/blob/main/l... has a lot less and claims to be correctly rounded.
For 32-bit float, yes. For double precision, not so much.
What do you mean by "taking the square root"? sqrtps has exactly the same problem, it's not standardized, so you need a custom algorithm.
Sqrt is standardized by IEEE, it is required to be the correctly rounded value.
square root is an IEEE 754 basic operation, correctly rounded (and therefore bitwise reproducible) on all compliant systems. What made you think that it isn't?