Hacker News new | ask | show | jobs
by seaman1921 1518 days ago
Thank you for taking the time to explain your modelling. Unfortunately I will need to read more on this topic, because I do not understand the intuition behind the priors "uniform prior of alpha_prior = 1, and beta_prior = 1".

The way I would generally approach such a problem is by running monte carlo simulations. Assuming the true rate of nurses quitting is X, what is the chance that a random sample of 200 nurses has the expectation of quitting >= 90%. To get the lower bound of the confidence interval, I will run this simulation for several values of X, starting at say X=60%, increasing until I get >95% chance that a random sample of 200 nurses has E(quitting) > 90%. Do you think this approach makes sense ?

1 comments

Simulations are fantastic, and often necessary for tricky statistics problems, however what you are describing is reinventing so much of the wheel using simulation that you are going to be spending multiple orders of magnitude extra computation to get an approximately correct solution. You also do have some conceptual errors in your plan.

For example

> Assuming the true rate of nurses quitting is X, what is the chance that a random sample of 200 nurses has the expectation of quitting >= 90%.

You have just described the Binomial distribution [0], which is probably the most elementary distribution you learn about when studying probability and statistics (even the Bernoulli is just a special case of it). There's no need to run simulations to answer this particular question.

There are also some fundamental misunderstandings with your approach:

> increasing until I get >95% chance that a random sample of 200 nurses has E(quitting) > 90%.

The probability of getting > 90% 'yes/quitting' (i.e. more than 180) if the true probability 'yes' is in fact 0.9 is only 0.46. You won't cross your threshold of 95% here until you reach X=0.933

If you wanted to construct the 95% CI from pure simulation, a better approach would be to sample 200 observations from a 0.9 Bernoulli random variable (just sample from a uniform, and check if it's less than 0.9), compute the mean of the samples, and repeat this 10,000 or so times. Then look at the empirical CDF [1] (fairly easy to implement in code) and look at the lower 2.5% and upper 2.5% values and you have your bounds (which will be the same as the ones I posted within some epsilon).

I do recommend, if you're seriously interested in understanding this, picking up a basic probability/stats book and work your way through it.

0. https://en.wikipedia.org/wiki/Binomial_distribution 1. https://en.wikipedia.org/wiki/Empirical_distribution_functio...

Did everyone ignore the word 'considered'.