Hacker News new | ask | show | jobs
by dahart 530 days ago
EDIT: after double-checking my work, I realized I have a better bound on maximum error, but not a better average error. So, the magic number depends on the goal or metric, but mean relative error seems reasonable. Leaving my original comment here, but note the big caveat that I’m half wrong.

One can do better still - 0x7EF311BC is a near optimal value at least for inputs in the range of [0.001 .. 1000].

The simple explanation here is:

The post’s number 0x7EFFFFFF results in an approximation that is always equal to or greater than 1/x. The value 0x7EEEEBB3 is better, but it’s less than 1/x around 2/3rds of the time. My number 0x7EF311BC appears to be as well balanced as you can get, half the time greater and half the time less than 1/x.

To find this number, I have a Jupyter notebook that plots the maximum absolute value of relative error over a range of inputs, for a range of magic constants. Once it’s setup, it’s pretty easy to manually binary search and find the minimum. The plot of max error looks like a big “V”. (Edit while the plot of mean error looks like a big “U” near the minimum.

The optimal number does depend on the input range, and using a different range or allowing all finite floats will change where the optimal magic value is. The optimal magic number will also change if you add one or more Newton iterations, like in that github snippet (and also seen in the ‘quake trick’ code).

PPS maybe 0x7EF0F7D0 is a pretty good candidate for minimizing the average relative error…?

1 comments

Your suggestion got me intrigued. I have a program that does an exhaustive check for maximum and average error, so I'll give your numbers a spin.
Given my search criteria, the optimal magic number turns out to be: 0x7ef311c2

  Initial approximation:
    Good bits min: 4
    Good bits avg: 5.242649912834
    Error max: 0.0505102872849 (4.30728 bits)
    Error avg: 0.0327344845327 (4.93304 bits)

  1 NR step:
    Good bits min: 8
    Good bits avg: 10.642581939697
    Error max: 0.00255139507338 (8.61450 bits)
    Error avg: 0.00132373889641 (9.56117 bits)

  2 NR steps:
    Good bits min: 17
    Good bits avg: 19.922843217850
    Error max: 6.62494557693e-06 (17.20366 bits)
    Error avg: 2.62858584054e-06 (18.53728 bits)

  3 NR steps:
    Good bits min: 23
    Good bits avg: 23.674004554749
    Error max: 1.19249960972e-07 (22.99951 bits)
    Error avg: 3.44158509521e-08 (24.79235 bits)
Here, "good bits" is 24 minus the number of trailing non-zero-bits in the integer difference between the approximation and the correct value, looking at the IEEE 754 binary representation (if that makes sense).

Also, for the NR steps I used double precision for the inner (2.0 - x * y) part, then rounded to single precision, to simulate FMA, but single precision for the outer multiplication.

Ah very nice, I was close with using max error - 0.05051 is the same number I got. Pretty sure 0x7ef311c2 came up for me at least a few times as I was fiddling with parameters. Is this using minimum good bits as the deciding criteria, or is it the best overall number using one of the averages and also 1-3 NR steps? Did you limit the input range, or use all finite floats? Having the min/avg error in bits is nice, it’s more intuitive than relative error.

I like the FMA simulation, that’s smart; I didn’t think about it. I did my search in Python. I don’t have it in front of me right now, and off the top of my head I’m not even sure whether my NR steps are in Python precision or fp32. :P My posts in this thread were with NR turned off, I wanted to find the best raw approximation and noticed I got a different magic number when using refinement. It really is an amazing trick, right? Even knowing how it works it still looks like magic when plotting the result.

Thanks for the update!

BTW I was also fiddling with another possible trick that is specific to reciprocal. I suspect you can simply negate all the bits except the sign and get something that’s a decent starting point for Newton iters, though it’s a much worse approximation than the subtraction. So maybe (x ^ 0x7fffffff). Not sure if negating the mantissa helps or if it’s better to negate only the exponent. I haven’t had time to analyze it properly yet, and I don’t know of any cases where it would be preferred, but I still think it’s another interesting/cute observation about how fp32 bits are stored.

When measuring the errors I exhaustively iterate over all possible floats in the range [1, 2), by enumerating all IEEE 754 single precision representations in that range. That's "only" 2^23 numbers, so perfectly doable.

My selection criteria was abit complex, but something like this:

1. Maximize number of accurate bits in the approximation.

2. Same in NR step 1, then NR step 2 etc.

3. Minimize the max error in the approximation, and then the avg ertor in the approximation.

4. Same for NR step 1, 2, ...