Hacker News new | ask | show | jobs
by nhellman 1151 days ago
This is a good point, I updated the article to include a comparison where the naive method is only using standardized floating-point operations. When not using -funsafe-math-optimizations the compiler emits sqrtps followed by divps (sqrtps seems to implement sqrt of ieee-754).

In this case, the Q_rsqrt actually seems to provide a 2-4x speedup compared to the reproducible naive method.