| > That's an absolute difference of 2.7%. Again, 100% random data. I think I get what you're going for here -- you're trying to simulate a coin flip? -- but what you've actually done is made successive draws from a uniform random number generator. The software is designed to return numbers that fall along the interval [0,1) with equal probability. Thresholding the numbers and dividing their counts is not a meaningful transformation; the result is still just a uniformly distributed random number. It's like...the ratio of heads in two identical, unfair coins or something. If all "random numbers" were uniform like this, then no, we wouldn't expect an X% difference to be any more or less likely based on the magnitude of the underlying sample. But when we're talking about something like a a population mean, then the behavior of the errors on estimates is very different indeed, and most estimates cluster around the true (aka population) value: https://online.stat.psu.edu/stat415/lesson/9/9.4 As the sample size for an experiment of this sort gets larger, the bell curve of expected errors gets sharper and sharper, and it becomes increasingly less likely to see errors >= X, for any value X. In the limit of large N, the distribution of sample errors around a known mean approach a normal distribution: https://www.jmp.com/en_us/statistics-knowledge-portal/t-test... For what it's worth, the expected proportion of N heads in M coin flips is modeled using the binomial distribution, which is also bell-shaped and illustrates the same idea: https://en.wikipedia.org/wiki/Binomial_distribution |
This is wrong. That is a very meaningful transformation. It is the standard way (https://stats.stackexchange.com/questions/240338) to turn a uniform distribution into a Bernoulli distribution.
Getting a single value with Bernoulli distribution is called a Bernoulli trial (https://en.wikipedia.org/wiki/Bernoulli_trial). Repeating this gives you a Binomial distribution (see your own wikipedia link).
Long story short: GPs code is a perfectly valid way of sampling the Bernoulli distribution. It is inefficient because it needs so many random values, but it mimics the actual process happening in real life making it easier to understand than generating a Binomial sample from the Binomial distribution's CDF.