Hacker News new | ask | show | jobs
by jsweojtj 2310 days ago
This is exactly the question I was going to ask.

I wrote: > The leading digits of a uniform distribution does not follow Benford's law.

And @EGreg wrote: > I’m sorry to tell you this, but you inadvertently misled people with that empirical test. This just goes to show that we have to check our assumptions, as scientists or mathematicians trying to prove a statement. (Even with empirical tests :)

So, what specific range of the uniform distribution yields leading digits that follows Benford's law?

1 comments

Literally any range with min = 0 and where the max isn’t a power of 10.

For example 0-300

One third of numbers are evenly distributed: 0-100

One third starts with 1: 100-200

One third starts with 2: 200-300

Do you understand?

I understand.

There is a distribution of leading digits that looks like:

    d   P(d)
    1   30.1%   
    2   17.6%   
    3   12.5%   
    4   9.7%    
    5   7.9%    
    6   6.7%    
    7   5.8%    
    8   5.1%    
    9   4.6%    
 
As wikipedia says, "It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, house prices, population numbers, death rates, lengths of rivers, physical and mathematical constants."

Neat! For each of those data sets you get the same distribution. Now, someone (I won't say who), says that it also is true for the uniform distribution.

But it isn't.

It simply isn't.

And I said as much when I said, "The leading digits of a uniform distribution does not follow Benford's law."

And your counter example is if you take a uniform distribution from 0-300, the leading digits go to something like:

    d   P(d)
    1   36.7%   
    2   36.7%   
    3   3.7%   
    4   3.7%    
    5   3.7%
    6   3.7%    
    7   3.7%
    8   3.7%
    9   3.7%
Great, so I don't know how we can disagree at this point. The above distribution is not Benford's Law.

> "The leading digits of a uniform distribution does not follow Benford's law." -- me

And you, directly disagreeing with that correct statement:

> This just goes to show that we have to check our assumptions, as scientists or mathematicians trying to prove a statement. -- EGreg

Indeed.

That's not Benford's law though. That's just a weird distribution due to a weird cutoff.

Bensford's law is 1:30.1%, 2:17.6% 3:12.5% etc.

For the record you’re changing the goalposts. The op claimed that his example proves that the digits always have the same chance of appearing, which is clearly false.

When the max is uniformly distributed then Benford’s law emerges. I mean, all you have to do is read the link - where I derive it.

What exactly is the law — please don’t handwave. If the law is those exact point values mentioned in the article then I just showed you how we arrived at them.

What you are describing is not even the result of a uniform distribution. It's a two step process involving two uniform distributions. The end result is some weird non uniform downward sloping distribution.
That’s because we aren’t trying to look at one specific uniform distribution. We were asking why Benford’s law happens for almost all processes that follow a uniform distribution and record the result as positional notation with digits — namely that 1 appears a lot more than 2, which appears a lot more than 3, etc. Roughly in the proportion that 1 is twice that of 2, which is 1/3 more than 3, etc.

(Btw it is NOT true for eg dictionary words for example, an initial A doesnt appear more than B. That should tell you something!)

And to understand the reason we just have to look at the family of uniform distributions, and see that for almost all of them, this proportion holds. Sure, for some of them, the 1,2,3 may be even MORE prevalent relative to 4-9 because the maximum value was 400 or 4000 or 40000. Ok? You can see this. For a uniformly distributed process that happens to have that as the maximum, Benford’s law will have the same proportions between 1,2,3 but then drop for 4-9 since they didn’t get that “boost”.

But if you keep sampling and this maximum keeps growing by some continuous distribution that’s not perfectly synced with the metric system, then it’s as likely to be in the range 100-200 as it is to be in 200-300. And then as likely to be in 1000-2000 as in 2000-3000. Given that, we get something like Benford’s law.

Now, perhaps it is ALSO TRUE for other distributions. I just explained why it’s true for uniform ones.

If you took a random letter in the alphabet and then sampled from any letter before this letter you would get more samples from the earlier letters of the alphabet. That is because the two step process discards higher letters in the first step. This is not a uniform distribution and is not Bensford's law. It's just a weird two step process that over-samples earlier letters.
You just keep going round and round with handwaving that makes no sense. I read your link. I did not see Benford's law emerging anywhere in your link.

What does "max is uniformly distributed" even mean? If you think that the Benford's law holds good for a set of uniformly distributed numbers, why not simply provide that set? It would be so easy to prove your claim if you just provide an example set of numbers that obeys Benford's law.

All sets of numbers you have presented so far (0-300, 0-30000, 0-300000000000000000000000) do not follow Benford's law. It is very simple to show. In all these sets, the probability of first digit as 1 is equal to the probability of first digit as 2 which contradicts Benford's law.

That’s because you aren’t trying to find the probability of a digit given any SPECIFIC maximum, you are trying to sum the probability of the digit given that the maximum is in a certain range, over all ranges.

With large ranges, even if you exclude a power of 10 in the upper bound, it does not change the 11.11% chance of each digit being the first digit.*

That is JUST FALSE ok? For for pretty much any distribution you choose for the max, other than 100% chance it is a power of 10 and 0% chance other numbers, you’ll get that the digit 1 comes up way more than 2, which would come up more than 3, etc. How much more? This comes from the fact that there are just as many numbers 100-200 as there are 0-100. Ok? And that’s all 1s. Then you hit the 2s, and so on.

If the max happens to be anywhere in the range 100-1000 with equal probability, you get that result. Benford’s law. If the max is distributed as some sort of continuous distribution — and not that ridiculous distribution of ONLY ever being powers of 10 — then you likely get something similar.

What are you arguing about?? If you are saying it’s mysterious why the lower digits come up more than higher ones, well the mystery is over. If you want an EXACT fit to the numbers in the article then I think they come out whenever the max is uniformly distributed between 10^n and 10^(n+1). But they may also have a sort of “law of large numbers” thing where pretty much any continuous distribution of the max leads to this law. That part I can’t tell you. What I can tell you is OBVIOUSLY the lower digits will come out more frequently.

The numbers in the range 0-300 do not obey Benford's law. In base 10, a set of numbers that Benford's law if the leading significant digit d (0 < d < 10) occurs with probability log10(1 + 1/d). This isn't the case for the set of numbers between 1 and 300, inclusive.
Your assertion that for large ranges every digit has the same chance of appearing is very wrong. Your empirical test is rigged by choosing a very rare max, literally the only one where it would “prove” your assertion.

Benford’s law appears when the max of your range is uniformly distributed

If you present a weird distribution to begin with, it should not be surprising that every digit does not have the same chance of appearing. That's not the point. We are not talking about weird distributions here.

If we are going to argue like this, I might as well present a set of two numbers S = {1, 2} and claim that when we choose numbers from uniform distribution, the probability of 3 occurring as the first digit is 0. Other commenters are not assuming weird distributions like this because this kind of discussion does not provide any new insights and is just a waste of time.

You can create all the strawmen you want. I am going to quote from Wikipedia:

The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small.

I have explained why that happens for the vast majority of UNIFORMLY DISTRIBUTED VARIABLES.

The vast majority. That implies that there is a collection of all possible uniformly distributed variables, and in particular those that are sampled from real world processes.

As long as they are uniformly distributed, with 0 as the minimum and M as the maximum, the first digit will appear more commonly.

I explained it several times. Why are you still insisting that statements about MAJORITY of uniform distributions are weird?

Yes statements about collections of uniform distributions are not statements about ONE SPECIFIC uniform distribution. And?

Can you provide an example range of uniformly distributed integers that obeys Benford's law?