A question that I am not quite smart enough to figure out. The fast inverse square root works via a giant hack. hand waving a bit here but it has to do with because floats are stored as an exponential structure bit shifting one does magic math things to it. however note that bit shifting a float is not an implemented instruction. so you have to hack it by casting to an int.
Anyway the question is. If you implemented fast inverse square root with out the hack. (i don't know, perhaps packing your own bits, or it might be screwball enough you only assembly could do it.) would it be as fast?
It's not much of a "hack", it's literally how the number format works. If you implemented it without the "hack", you'd just be at the mercy of the compiler to see through your arithmetic on the float bit patterns and reconstitute it.
It's not actually that fast, either. Every mainstream architecture from the last 15 years (except maybe RISC-V, since it's exactly the sort of thing they would forget to add) has a "approximate square root" instruction with single-cycle throughput, and they're all both more accurate and more efficient than this. It was good for its time, but it's time has passed.
Anyway the question is. If you implemented fast inverse square root with out the hack. (i don't know, perhaps packing your own bits, or it might be screwball enough you only assembly could do it.) would it be as fast?