Hacker News new | ask | show | jobs
by cwyers 3810 days ago
The short version is that p-values have nothing to do with establishing a causal mechanism. It's a test of statistical significance, it doesn't try to say if that significance is because x causes y or y causes x or some unknown variable z causes both y and x.

The long version: So, in this case, we have two variables, Metacritic score (or Rotten Tomatoes percentage) and box office gross. We have measured the correlation between them for some number N of movies. N is smaller than the total population of movies that could be evaluated; if nothing else, the analysis doesn't consider movies that haven't been released yet. So the movies evaluated are considered, for the purposes of a p-value test, to be a sample of size N from a hypothetical infinite population of movies with box office gross and Metacritic scores. The p-value test also assumes that there's something called a null hypothesis, which is that there is no relationship between Metacritic scores and box office gross. The null hypothesis is the hypothesis that all other hypothesises (is that the right plural? I don't know) are evaluated against.

What the p-value measures (in this case, it can be applied to other statistics as well) is the probability of seeing a correlation of that size or greater given a random sample of N observations if the null hypothesis is true. What's notable about this is that the p-value is not testing anything about any hypothesis other than the null hypothesis -- it can be used as evidence about the likelihood of the null hypothesis being true, but past that it has nothing to say about any OTHER hypothesis. Which is why it's wrong to say that a p-value test is evidence of a causal relationship -- it's not even trying to test that.

1 comments

Yes, this is correct.

I just realized I misread the GP's comment: thought it said "P-value" in general instead of "P-value of Pearson correlation," as a result I thought he was referring to a regression P-value.

As far as I understand the p-value of a Pearson correlation is exactly the same as the p-value of a simple linear regression.
You understand correctly. I have no idea what he's even trying to say now.