Hacker News new | ask | show | jobs
by uxcn 3749 days ago
The latency for sqrtss on broadwell is 11 cycles with a throughput of 4, where mul is a latency of 3 and throughput of 1. So, using some concrete numbers, sqrt is more expensive, but not polynomially or even an order of magnitude.
1 comments

Right with modern floating point implementations its not the old guess-and-iterate method any more. SQRT is probably now on the order of an inverse?
I honestly haven't looked into it in that much detail, but if I had to guess... probably. I'm not sure what the implementation used in hardware currently is, but at very least it's constant bounded,

Conversions are still somewhat expensive, but I do know compared to polynomial time or a large enough constant, it can be a better choice for an optimization. For example, computing log10 of an integer.