Hacker News new | ask | show | jobs
by throw16916145 2984 days ago
>they only test null hypothesis that are true.

If a null hypothesis is invariably true, it's impossible to reject it. Which means the scientists will not be able to find any statistic or data to support any of their bad, original hypotheses. Not 5%, not 0.005%, nor whatever.

p-values are not flawed. They are a useful tool for a certain category of jobs: namely to check how likely your sample is, given a certain hypothesis.

The argument in the original post is a bit of a straw man fallacy.

"I want to know the probability that the null is true given that an observed effect is significant. We can call this probability "p(null | significant effect)"

OK, hypothesis testing can't answer this type of questions.

Then "However, what NHST actually tells me is the probability that I will get a significant effect if the null is true. We can call this probability "p(significant effect | null)"."

Not quite correct. It's "p(still NOT a significant effect whatever it means | null)".

EDIT. Fixed the last sentence.

2 comments

> If a null hypothesis is invariably true, it's impossible to reject it. Which means the scientists will not be able to find any statistic or data to support any of their bad, original hypotheses. Not 5%, not 0.005%, nor whatever.

Why argue when you can simulate:

    > n <- 50
    > simulations <- 10000
    > sd <- 1
    > se <- sd/sqrt(n)
    > crit <- 1.96 * se
    > mean(abs(colMeans(sapply(rep(n, simulations), rnorm))) > crit)
    [1] 0.0494
Lo and behold, we reject the null hypothesis that the mean of a normal distribution is equal to zero in 5% of all simulations, even though the null hypothesis is in fact true. (`rnorm` defaults to 0 mean and 1 sd)
It's always refreshing to meet a fellow R hacker on HN!

May I ask you why you chose to use the normal distribution in your example or any distribution at all, for that matter? What I was replying to was

">they only test null hypothesis that are true."

Which means that the null hypothesis is always true no matter what data you collect trying to reject it. It does not depend on the null distribution (normal in your example), the value of the test statistic (the mean of the sample in your example), or the threshold (crit in your example). In fact, the null distribution in this case is not a distribution at all since there's no randomness in the null hypothesis. We know for a fact that it is always true (in the hypothetical situation we are considering).

It's more like

     > rep(FALSE, simulations) # is the null hypothesis false? nope
or, if you insist on using the normal distribution,

     > abs(colMeans(sapply(rep(n, simulations), rnorm))) > +Inf

In fact, in your example, since you are essentially running 1000 hypothesis tests on different samples, multiple hypothesis correction would solve the "problem" with p-value. This is how I would do it.

     > n <- 50
     > simulations <- 10000
     > x <- sapply(rep(n, simulations), rnorm)
     > p <- sapply(apply(x, 2, FUN=t.test), function(tt) tt$p.value)
     > pa <- p.adjust(p, method="fdr")
     > library(boot)
     > boot.out <- boot(pa, function(d, i) mean(d[i]), R=1000)
     > boot.ci(boot.out, conf=0.95, type="basic")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates

CALL : boot.ci(boot.out = boot.out, type = "basic")

Intervals : Level Basic 95% ( 0.9774, 0.9780 ) Calculations and Intervals on Original Scale

P.S. p-values are great when used appropriately.

> May I ask you why you chose to use the normal distribution in your example or any distribution at all, for that matter?

The distribution is not important, any other data generator would do.

> Which means that the null hypothesis is always true no matter what data you collect trying to reject it.

The idea behind the thought experiment was that we live in a world in which researchers always investigate things that will turn out not to exist / be real, but the researchers themselves don't know this!, otherwise they wouldn't bother to run the investigations in the first place.

> In fact, in your example, since you are essentially running 1000 hypothesis tests on different samples, multiple hypothesis correction would solve the "problem" with p-value.

They're not multiple tests. They're multiple simulations of the same test, to show how the test performs in the long run.

Perhaps you're a wonderful statistician, I wouldn't know, but nothing you have said thus far about null hypothesis significance testing makes any sense or is even remotely correct.

> If a null hypothesis is invariably true, it's impossible to reject it. Which means the scientists will not be able to find any statistic or data to support any of their bad, original hypotheses. Not 5%, not 0.005%, nor whatever.

You've never heard of random error? Just because a null hypothesis may accurately describe a data generating phenomenon doesn't mean you will never get samples that aren't skewed enough to have a significant effect.

Pretend we are comparing neighborhoods. Say the true age of the people in my neighborhood and your neighborhood is actually equal, at 40, but my alternative hypothesis is that the average age of residents in my neighborhood is younger than yours (thus the null is they are the same, which unbeknownst to me is the truth). You are claiming that no matter how many random samples of residents of our two neighborhoods we take, they will always be close enough in average age that we will always fail to reject the null. That's obviously not the case.

In fact, by definition, the p-value is stating that we should expect 5% of samples we draw to indicate my neighborhood is significantly younger than yours, even though that isn't true, solely due to the randomness of our samples. That's literally the purpose of the p-value.