Hacker News new | ask | show | jobs
by turtledragonfly 1092 days ago
I suppose implicit in my assumptions is that if "1" is the number I care about, that's the sort of values I'm going to be working with w/regard to my target data.

So, if I am doing some +1/-1 sort of math on a bunch of numbers, and those numbers are "far away" (eg: near 1e+8 or near 1e-8), then it is better to transform those numbers near "1 space", do the math, then transform it back, rather than trying to do it directly in that far-away space.

But yes, I suppose in your phrasing, that does come down to the ratio of the numbers involved — 1 vs 1e±8. You want that ratio to be as near 1 as possible, I think is what you mean by "limit the ratio"?

1 comments

Well "1" won't consistently be "the typical amount you add/subtract" and "the typical number you care about" at the same time.

Like, a bank might want accuracy of a 1e-4 dollars, have transactions of 1e2-1e5 dollars, and have balances of 1e5-1e8 dollars.

That's three ranges we care about, and at most one of them can be around 1.0. But which one we pick, or picking none at all, won't affect the accuracy. The main thing affecting accuracy is the ratio between biggest and smallest numbers which in this case is 1e12.

If you set pennies to be 1.0, or basis points to be 1.0, or a trillion dollars to be 1.0, you'd get the same accuracy. Let's say some calculation is off by .0000003 pennies from perfect math. All those versions will be off by .0000003 pennies. (Except that there might be some jitter in rounding based on how the powers align, but let's ignore that for right now.)

There's something I'm not quite getting, here.

Let's take your bank example, with 32-bit floats. Since you say it doesn't matter, lets set "1" to be "1 trillion dollars" (1e12). A customer currently has a balance of 1 dollar, so it's represented as 1e-12. Now they make 100 deposits, each of a single dollar. If we do these deposits one-at-a-time, we get a different result than if we do a single deposit of $100, thanks to accumulated rounding errors. Ok, fine.

Now we choose a different "1" value. You say "which one we pick, or picking none at all, won't affect the accuracy," but I think in this case it _does_? In this second case, we set "1" to be 1 dollar, and we go through the same deposits as above. In this case, both algorithms (incremental and +$100 at once) produce identical results — 101, as expected.

I agree that there can be multiple ranges that we care about, which can be tricky, but I don't agree that it doesn't matter what "1" we pick.

But I am probably misinterpreting you in some way (:

If you can avoid rounding then your answers will be more accurate. But that's almost entirely separate from how big your "1" is.

If you set "1" to be "2^40 dollars (~1.1 trillion)", then $1 is represented as 2^-40. Adding that up 100 times will have no rounding, and give you exactly the same result as a deposit of $100.

On the opposite side of things, setting "1" to be "3 dollars" or "70 cents" would cause rounding errors all over, even though that's barely different in scale.

Okay, I think we are basically on the same page.

But since I'm finding this helpful ... (:

We've been talking about addition so far, and relative scales between numbers. But suppose we just consider a single number, and multiply it by itself some times.

Certainly if that number is 1, we can keep doing it forever without error.

But the further we get away from 1 (either 1e+X or 1e-X), the more quickly error will be generated from that sequence, eventually hitting infinity or zero.

I'm just trying to express through this example that there is still something "special" about 1 in scale (likewise 0, in offset), where you want to be "close to" it, in the face of doing some arbitrary math, in order to produce better results. It doesn't even need to involve relative sizes between 2 different numbers.

It depends on what math you're doing. Very often you're likely to find that a base number of 1e-6 makes you less likely to hit your exponent limits than a base number of 1.

1 is special in that half the positive floats are above it and half are below. That doesn't mean your use case wants half.

Then would it be fair to say that if you don't know what calculations might be coming, all other things being equal, 1 is a good choice since it is "unbiased" in this sense?