|
|
|
|
|
by thoughtFrame
305 days ago
|
|
I've read about the fast inverse square root trick, but it didn't occur to me that it can be used for other formulas or operations. Is this a common trick in DSP/GPU-like architectures nowadays? And what's the mathematical basis? (that is, is this technique formalized anywhere?) It seems insane to me that you run Newton's algorithm straight on the IEEE 754 format bits and it works, what with the exponent in excess coding and so on |
|
It turns out you do have an ultra fast way to compute log_2: you bitcast a float to an integer, and then twiddle some bits. The first 8 bits (after the sign bit, which is obviously zero because we're assuming our input is positive) or whatever are the exponent, and the trailing 23 bits are a linear interpolation between 2^n and 2^(n+1) or whatever. exp_2 is the same but in reverse.
You can simply convert the integer to floating point, multiply by -.5, then convert back to integer. But multiplying -.5 by x can be applied to a floating point operating directly on its bits, but it's more complicated. You'll need to do some arithmetic, and some magic numbers.
So you're bitcasting to an integer, twiddling some bits, twiddling some bits, twiddling some bits, twiddling some bits, and bitcasting to a float. It turns out that all the bit twiddling simplifies if you do all the legwork, but that's beyond the scope of this post.
So there you go. You've computed exp_2(-.5 log_2 x). You're done. Now you need to figure out how to apply that knowledge to the inverse square root.
It just so happens that 1/sqrt(x) and exp(-.5 log x) are the same function. exp(-.5 log x) = exp(log(x^-.5)) = x^-.5 = 1/sqrt(x).
Any function where the hard parts are computing log_2 or exp_2 can be accelerated this way. For instance, x^y is just exp_2(y log_2 x).
Note that in fast inverse square root, you're not doing Newton's method on the integer part, you're doing it on the floating point part. Newton's method doesn't need to be done at all, it just makes the final result more accurate.
Here's a blog here that gets into the nitty gritty of how and why it works, and a formula to compute the magic numbers: https://h14s.p5r.org/2012/09/0x5f3759df.html