|
|
|
|
|
by uxcn
3749 days ago
|
|
The latency for sqrtss on broadwell is 11 cycles with a throughput of 4, where mul is a latency of 3 and throughput of 1. So, using some concrete numbers, sqrt is more expensive, but not polynomially or even an order of magnitude. |
|