| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MattyDub 1058 days ago
	Can somebody explain why a square root is also considered a flop? Surely that involves more work than the other four operations the article listed. Is there some hardware algorithm for the square root that is as fast as (e.g.) division?

3 comments

stephencanon 1058 days ago

Square root is pretty much equivalent to division in complexity, and computed by similar techniques (digit-by-digit methods or newton-raphson or goldschmidt iterations). Division is often a little more efficient, but square root has fewer messy edge cases (it never overflows nor underflows).

Division and square root are generally slower than the other arithmetic operations, in both latency and throughput. They are finally partially pipelined in recent CPUs (a result every two or three cycles), but were totally unpipelined in mainstream designs for many years before that. A decade ago, they might take a few tens of cycles, now they’re generally somewhere around ten cycles latency on “real” CPUs, vs 3-5 cycles latency for the other floating point arithmetic instructions.

link

herodoturtle 1058 days ago

Relevant Quake engine trivia, for those that are interested:

https://en.m.wikipedia.org/wiki/Fast_inverse_square_root

link

somat 1058 days ago

A question that I am not quite smart enough to figure out. The fast inverse square root works via a giant hack. hand waving a bit here but it has to do with because floats are stored as an exponential structure bit shifting one does magic math things to it. however note that bit shifting a float is not an implemented instruction. so you have to hack it by casting to an int.

Anyway the question is. If you implemented fast inverse square root with out the hack. (i don't know, perhaps packing your own bits, or it might be screwball enough you only assembly could do it.) would it be as fast?

link

stephencanon 1058 days ago

It's not much of a "hack", it's literally how the number format works. If you implemented it without the "hack", you'd just be at the mercy of the compiler to see through your arithmetic on the float bit patterns and reconstitute it.

It's not actually that fast, either. Every mainstream architecture from the last 15 years (except maybe RISC-V, since it's exactly the sort of thing they would forget to add) has a "approximate square root" instruction with single-cycle throughput, and they're all both more accurate and more efficient than this. It was good for its time, but it's time has passed.

link

lmm 1058 days ago

Guess: It's one of the operations specified in IEEE 754, and traditionally the implementation is supplied by your hardware vendor rather than implemented by you (even if that implementation is "really" a software algorithm).

link

jjgreen 1058 days ago

On x86-64, there are instructions, see https://stackoverflow.com/a/54642811/468334

link