Hacker News new | ask | show | jobs
by eesmith 610 days ago
The distribution of digits is 'highly imbalanced' because that's what random distributions look like. I'll randomly select the digits 0-9 for 10,000 times and show the distribution, then do the same with the first 10,000 digits of pi, then do the random distribution again:

  >>> import random
  >>> from collections import Counter
  >>> ctr = Counter(random.choice(range(10)) for i in range(10_000))
  >>> for digit, count in ctr.most_common():
  ...   print(f"{digit}: {count}")
  ...
  2: 1039
  4: 1035
  0: 1031
  7: 1022
  3: 1008
  6: 998
  1: 976
  5: 973
  9: 963
  8: 955
  >>> pi_ctr = Counter(open("1-10000.txt").read().rstrip())
  >>> for digit, count in pi_ctr.most_common():
  ...   print(f"{digit}: {count}")
  ...
  5: 1046
  1: 1026
  2: 1021
  6: 1021
  9: 1014
  4: 1012
  3: 974
  7: 970
  0: 968
  8: 948
  >>> ctr = Counter(random.choice(range(10)) for i in range(10_000))
  >>> for digit, count in ctr.most_common(): print(f"{digit}: {count}")
  ...
  8: 1060
  2: 1048
  0: 1034
  4: 1026
  5: 1025
  3: 979
  7: 977
  6: 960
  1: 956
  9: 935
You can see that the distribution of pi's first 10,000 digits is what one should expect for a random distribution. If your method requires a 50/50 distribution then it cannot be used for this purpose.

Also, you are thinking about it wrong. The first 10,000 digits of pi are perfectly predictable.

1 comments

I'm not predicting the number I'm predicting number%2==0. The model predicted better than the distribution probability
It doesn't really matter. There are 4970 even digits and 5030 odd digits in the first 10,000. Predicting all odds gives you a better-than-even chance of being right.

What does "highly unbalanced" mean?

How often will a random sequence be "highly unbalanced"?

How many people used another model, found no pattern, and never reported it?

You have plenty of data to work with. Try the second 10,000, the third 10,000 and so on.

Keep clear in your mind that a lot of people worked on this problem, including trained mathematicians. It is far more likely that you do not fully understand what you are doing than that they are wrong. Believing otherwise is the path of crankdom.

Better to use statistical significance tests to talk about what is "far more likely"
It doesn't predict better than even, it predicts better than the distribution probability