|
|
|
|
|
by stephencanon
1023 days ago
|
|
Square root is pretty much equivalent to division in complexity, and computed by similar techniques (digit-by-digit methods or newton-raphson or goldschmidt iterations). Division is often a little more efficient, but square root has fewer messy edge cases (it never overflows nor underflows). Division and square root are generally slower than the other arithmetic operations, in both latency and throughput. They are finally partially pipelined in recent CPUs (a result every two or three cycles), but were totally unpipelined in mainstream designs for many years before that. A decade ago, they might take a few tens of cycles, now they’re generally somewhere around ten cycles latency on “real” CPUs, vs 3-5 cycles latency for the other floating point arithmetic instructions. |
|
https://en.m.wikipedia.org/wiki/Fast_inverse_square_root