Hacker News new | ask | show | jobs
by carabiner 1344 days ago
3% more/less likely is a tiny effect. What is the power of this result? Isn't there always noise in this type of thing?

I bet the normal variation in responses for the same group submitted to separate groups of companies exceeds 3%. That is, I could send 1,000 resumes to one batch of companies and 10.0% would get interviews. I could send the same 1,000 resumes to a different batch, and 10.3% would get interviews. Boom 3% difference for the same candidates.

3 comments

> 3% more/less likely is a tiny effect. What is the power of this result? Isn't there always noise in this type of thing?

A result does not have "power". An experiment has power -- the ability to detect a a given effect size a certain percentage of the time -- but a result is either statistically significant, or it is not.

As for "noise", statistical significance takes random noise into account. That is the point of the calculation -- it asks if a given result exceeds the threshold of what you'd expect to find at random some percentage of the time. If it does, the result is deemed significant.

A 3% difference could be enormous, or it could be miniscule. We can't say anything based on this information alone, and certainly can't say it's "likely a tiny effect". On a sample of thousands, a 3% difference is big. On a sample of tens, a 3% difference is small.

>On a sample of thousands, a 3% difference is big.

Not really. Only if it is many, many thousands. Assuming a totally random acceptance rate of 1/5:

   a = 0; 
   b = 0;
   for (c of Array(1000)) {
       if (Math.random() > .8) 
           a++;
       if (Math.random() > .8) 
           b++;
   }
   console.log(`a=${a}, b=${b}, a is ${(a/b - 1)*100}% more likely than b`)
   > a=209, b=201, a is 3.9800995024875663% more likely than b
literally the first run. And even in absolute terms, I got this on the third run:

   >a=192, b=219, a is -12.328767123287676% more likely than b
That's an absolute difference of 2.7%. Again, 100% random data.
> That's an absolute difference of 2.7%. Again, 100% random data.

I think I get what you're going for here -- you're trying to simulate a coin flip? -- but what you've actually done is made successive draws from a uniform random number generator. The software is designed to return numbers that fall along the interval [0,1) with equal probability. Thresholding the numbers and dividing their counts is not a meaningful transformation; the result is still just a uniformly distributed random number. It's like...the ratio of heads in two identical, unfair coins or something.

If all "random numbers" were uniform like this, then no, we wouldn't expect an X% difference to be any more or less likely based on the magnitude of the underlying sample. But when we're talking about something like a a population mean, then the behavior of the errors on estimates is very different indeed, and most estimates cluster around the true (aka population) value:

https://online.stat.psu.edu/stat415/lesson/9/9.4

As the sample size for an experiment of this sort gets larger, the bell curve of expected errors gets sharper and sharper, and it becomes increasingly less likely to see errors >= X, for any value X. In the limit of large N, the distribution of sample errors around a known mean approach a normal distribution:

https://www.jmp.com/en_us/statistics-knowledge-portal/t-test...

For what it's worth, the expected proportion of N heads in M coin flips is modeled using the binomial distribution, which is also bell-shaped and illustrates the same idea:

https://en.wikipedia.org/wiki/Binomial_distribution

> I think I get what you're going for here -- you're trying to simulate a coin flip? -- but what you've actually done is made successive draws from a uniform random number generator. The software is designed to return numbers that fall along the interval [0,1) with equal probability. Thresholding the numbers and dividing their counts is not a meaningful transformation;

This is wrong. That is a very meaningful transformation. It is the standard way (https://stats.stackexchange.com/questions/240338) to turn a uniform distribution into a Bernoulli distribution.

Getting a single value with Bernoulli distribution is called a Bernoulli trial (https://en.wikipedia.org/wiki/Bernoulli_trial). Repeating this gives you a Binomial distribution (see your own wikipedia link).

Long story short: GPs code is a perfectly valid way of sampling the Bernoulli distribution. It is inefficient because it needs so many random values, but it mimics the actual process happening in real life making it easier to understand than generating a Binomial sample from the Binomial distribution's CDF.

> This is wrong. That is a very meaningful transformation. It is the standard way (https://stats.stackexchange.com/questions/240338) to turn a uniform distribution into a Bernoulli distribution.

The OP didn't do what was described in the SO post. They did something else -- they calculated the ratio of two binomial random variables, and presented that as a percentage.

Also, no, the SO comment you've cited doesn't describe how to generate a "Bernoulli distribution" (not a thing, btw; it's called a binomial distribution) from a uniform distribution. It tells how to make a single Bernoulli trial...but even that isn't what OP did.

This is how you actually do what you're discussing (draw from the Binomial CDF given a uniform RNG, via a table):

https://math.stackexchange.com/questions/1427288/how-to-samp...

> "Bernoulli distribution" (not a thing, btw)

So this Wikipedia article is a fever dream? https://en.wikipedia.org/wiki/Bernoulli_distribution

> they calculated the ratio of two binomial random variables, and presented that as a percentage.

Ok, now I'm confused. I 100% agree with that statement. I thought your whole point was that OPs code was not a valid way to sample from a binomial distribution?

But then what is your criticism? Are you arguing that the Binomial distribution does not model the original experiment correctly?

That's precisely what this is trying to model, yes. The standard computational way to simulate a binary event with probability p is to call rand() and check if rand(0,1) < p (or > 1-p, what I did). Or as you called it, an unfair coin flip.

This model is built on the assumptions that if candidates are actually totally equally likely to be picked (the null hypothesis for the experiment above), any given candidate has a p=.2 chance of being hired (given an arbitrary but reasonable hire vs interview ratio of 1:5). Which is just a weighted coin flip. This is indeed a binomial distribution, and my point is that results ±3% of the mean (p*M), even at M=1000, are still fairly probable. When comparing two such results, it's almost expected.

The part where you did rand() < 0.8 ? 1 : 0 is fine. That's a Bernoulli trial with p=0.8

The part where you did this in a loop, with two calls per iteration, and then divided the counts and called it a percentage is wrong. It's certainly not a Binomial distribution. It's just the ratio of two binomial random variables.

Yeah I wouldn’t think anything of this if it were 3% the other direction… that’s a measurement error.
In that case n=1, not n=1000?