Hacker News new | ask | show | jobs
by PheonixPharts 1518 days ago
It's a normal approximation of the expected variance (in terms of standard deviation) in the estimate of the mean of the sum of 200 Bernoulli random variables. Each nurses' response is considered an observation of a Bernoulli distributed random variable, and we trying to determine the rate of that variable.

You are incorrect that "n is 1" since, by that logic one survey talking to 100,000 nurses would be the same as one talking to 3.

If you would like an alternate, more Bayesian formulation we can use the Beta distribution which is parameterized by alpha (numbers of 'yes') and beta (number of 'no').

This approach is a bit more intuitive than the Frequentist method since it answers the question "what do we believe to be the expected rate of nurses answering 'yes'"

In this case alpha=180 and beta=20, we'll include uniform prior of alpha_prior = 1, and beta_prior = 1

For Betas the posterior is defined quite nicely as:

Beta(alpha_posterior, beta_posterior) = Beta(alpha_likelihood + alpha_prior, beta_likelihood + beta_prior)

In general for Beta distributions we can compute the expectation as:

E[Beta(alpha,beta)] = alpha/(alpha + beta)

In this case: 181/202 = ~0.9

And the variance of a Beta distributed random variable is:

Var[Beta(alpha, beta)] = (alpha*beta)/((alpha+beta)^2 * (alpha + beta + 1))

Which for our case is:

0.00046

and the standard deviation of this is just it's square root:

0.021

Which gives us the same answer as we get with the normal approximation.

1 comments

Thank you for taking the time to explain your modelling. Unfortunately I will need to read more on this topic, because I do not understand the intuition behind the priors "uniform prior of alpha_prior = 1, and beta_prior = 1".

The way I would generally approach such a problem is by running monte carlo simulations. Assuming the true rate of nurses quitting is X, what is the chance that a random sample of 200 nurses has the expectation of quitting >= 90%. To get the lower bound of the confidence interval, I will run this simulation for several values of X, starting at say X=60%, increasing until I get >95% chance that a random sample of 200 nurses has E(quitting) > 90%. Do you think this approach makes sense ?

Simulations are fantastic, and often necessary for tricky statistics problems, however what you are describing is reinventing so much of the wheel using simulation that you are going to be spending multiple orders of magnitude extra computation to get an approximately correct solution. You also do have some conceptual errors in your plan.

For example

> Assuming the true rate of nurses quitting is X, what is the chance that a random sample of 200 nurses has the expectation of quitting >= 90%.

You have just described the Binomial distribution [0], which is probably the most elementary distribution you learn about when studying probability and statistics (even the Bernoulli is just a special case of it). There's no need to run simulations to answer this particular question.

There are also some fundamental misunderstandings with your approach:

> increasing until I get >95% chance that a random sample of 200 nurses has E(quitting) > 90%.

The probability of getting > 90% 'yes/quitting' (i.e. more than 180) if the true probability 'yes' is in fact 0.9 is only 0.46. You won't cross your threshold of 95% here until you reach X=0.933

If you wanted to construct the 95% CI from pure simulation, a better approach would be to sample 200 observations from a 0.9 Bernoulli random variable (just sample from a uniform, and check if it's less than 0.9), compute the mean of the samples, and repeat this 10,000 or so times. Then look at the empirical CDF [1] (fairly easy to implement in code) and look at the lower 2.5% and upper 2.5% values and you have your bounds (which will be the same as the ones I posted within some epsilon).

I do recommend, if you're seriously interested in understanding this, picking up a basic probability/stats book and work your way through it.

0. https://en.wikipedia.org/wiki/Binomial_distribution 1. https://en.wikipedia.org/wiki/Empirical_distribution_functio...

Did everyone ignore the word 'considered'.