|
|
|
|
|
by nhellman
1151 days ago
|
|
This is a good point, I updated the article to include a comparison where the naive method is only using standardized floating-point operations. When not using -funsafe-math-optimizations the compiler emits sqrtps followed by divps (sqrtps seems to implement sqrt of ieee-754). In this case, the Q_rsqrt actually seems to provide a 2-4x speedup compared to the reproducible naive method. |
|