Hacker News new | ask | show | jobs
by justQandA 2054 days ago
This looks really neat but I unfortunately don't have the necessary background knowledge to understand what this is even visualizing. Is there a tutorial to help acquire the necessary math/background knowledge to be able to comprehend what's being modeled?

For example, what are "Power", "Alpha", "n", and "d"?

Also "Type I" and "Type II" errors? What's the intuition as to how these all relate each other?

Is there a blog post, chapter, that explains all of this? Or would this require significant learning i.e. a full semester course or a textbook.

2 comments

I'm going to try my best to explain these in an intuitive way. Statistics has lots of terms with names that are arbitrary and confusing.

To set the context, we are trying to use data to help us test a hypothesis. An example might be: "if we give this pill to a person, they will be cured of their disease". Statisticians test this by setting up two groups: Group A gets nothing (or a placebo), Group B gets the pill.

In statistics, you assume the "Null Hypothesis", in other words, that there is no difference between the two groups. You use hypothesis testing to help you "reject" the null hypothesis, to say that the groups are actually different. If the groups are different, that means the pill cures the disease. So we take a bunch of data about the two groups, run some math on that data, and use the result of that math to help us decide if we can reject the null hypothesis.

Statistics is a bunch of tradeoffs between certainty, making the wrong call, and data volume. The terms you have mentioned are either "knobs" (tradeoffs) we can make or measures that helps us understand our results.

Here's what those terms mean:

Type 1 Error: also known as "False Positive". You thought the pill cured the disease, but it does not.

Type 2 Error: also known as "False Negative". You thought the pill did nothing, but it actually works.

Power: the chance to avoid Type 2 error (false negative). The higher your power value, the lower chance you incorrectly assume your pill is ineffective.

N: The number of "observations", in our case, the number of patients in the trial for our pill.

The others are a little trickier to explain.

Alpha: Statisticians use a "confidence interval" as a way to communicate how uncertain they are about a particular result. In our trial we might say "patients were 15% less likely to have the disease after taking the pill, give or take 2%". We don't think the decrease is exactly 15% (what we observed) but is instead somewhere in that neighborhood. Alpha is a measure of the chance the real effect is OUTSIDE of your confidence interval. So in this case, the chance the effect is < 13% or > 17%.

Cohen's D: In our trial, we might measure "the number of times the patient coughed in a day" in addition to "do they have the disease anymore yes/no". In order to compare our two groups, we make look at the average number of coughs per day in group A vs group B. This is called measuring the "difference in means". Cohen's D is a formula to measure the difference in means that also encodes your uncertainty in the result.

> Alpha: Statisticians use a "confidence interval" as a way to communicate how uncertain they are about a particular result. In our trial we might say "patients were 15% less likely to have the disease after taking the pill, give or take 2%". We don't think the decrease is exactly 15% (what we observed) but is instead somewhere in that neighborhood. Alpha is a measure of the chance the real effect is OUTSIDE of your confidence interval. So in this case, the chance the effect is < 13% or > 17%.

I know this sounds intuitive, but it is wrong.

The true effect is not a random variable.

The random variable is the statistic.

When we say "95% confidence interval", we are referring to the fact that 95% of the confidence intervals constructed based on the sampling distribution under the null will contain the true effect, not the chance that the true value is in the specific confidence interval you constructed.

Edit: The latter is either 0 or 1 but you don't get to find out in the context of a single test.

I knew I’d get something wrong, thanks for the correction.
I absolutely loath statistics terminology. It’s such a road block for people and an example of erudite traditionalism not being challenged enough.

(although the current terms in question are very mild and not knowing them probably speaks to how the vocab in general blocks people from learning stats rather than these particular terms doing it)

Can you recommend any good books or resources that give explanations at a similar level to this, ideally with a hint of data science and ML thrown in?
How to Lie with Statistics and The Cartoon Guide to Statistics might interest you.
This was amazingly helpful. Thank you!
It’ll probably require 2-3 semesters in stats to really understand - most intro courses barely cover the basic and don’t even reach power. You really have to apply regression to many real datasets to truly understand the concepts.