| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jermaink 3811 days ago
	Interesting! My question is: How do the p-values of your Pearson correlations look like? Correlations alone are vague. Please insert :-)

1 comments

minimaxir 3811 days ago

I deliberately avoided using P-values since that metric is used to imply a causal relationship between review scores and box office gross, which is definitely not true.

link

jermaink 3811 days ago

In the evaluation of correlations, it can always be informative to know the confidence interval for r, with all caution towards p-value interpretation.

Surely, correlation provides information on association rather than cause and effect (causation should rather be modeled with Granger and other regression models). Sample sizes and variances will certainly contribute to different p-value outcomes. This is because p-values reward low variance more than the magnitude of impact (Type I/II error etc.). If you have p-values, better report them and add a footnote on how to interpret them.

link

minimaxir 3811 days ago

> However, the significance or p-value reflects the probability that the correlation does not imply a causal relation.

Technically, in this case, a significance test would answer the question "is the Pearson correlation statistically significant from 0?" In this case, we would expect it to fail since it clearly isn't, and is therefore the test is less helpful/important. (even if it passed, the conclusion would be "correlations are low in magnitude and therefore do not matter" as noted in the post anyways)

Finding the exact P-value of a Pearson correlation requires setting up bootstrapping, which is not something I have handy at the time but will work on in future posts.

Again, I'm not looking at R^2 and the P-value of a linear regression, which is different.

link

jermaink 3811 days ago

It's just a recommendation to improve the reporting, no general defense of p-values. Pearson does not imply to analyze a causal relationship. I see the point it's not linear (then you would have had a fitted linear reg, I assume) but still can tell you that missing p-values may cause arching eyebrows :)

In small sample sizes, correlation can easily be significant, often at the cost of low confidence. To the opposite, in large sample sizes, the magnitude of the effect may be lower but at higher confidence. In both cases, results have to be interpreted with caution. The recent p-value debate points towards a lot of issues here. For instance, there have been medical studies overestimating correlations in small sample sizes while other authors seemed to underestimate their long-term large-sample results with correlations in the ballpark of 0.15 (p<0.05).

link

cwyers 3810 days ago

You need to stop saying stuff. You're incredibly wrong here.

link

andreasvc 3810 days ago

Could you please state the problem instead of making ad hominems? I'm genuinely curious.

link

cwyers 3810 days ago

The short version is that p-values have nothing to do with establishing a causal mechanism. It's a test of statistical significance, it doesn't try to say if that significance is because x causes y or y causes x or some unknown variable z causes both y and x.

The long version: So, in this case, we have two variables, Metacritic score (or Rotten Tomatoes percentage) and box office gross. We have measured the correlation between them for some number N of movies. N is smaller than the total population of movies that could be evaluated; if nothing else, the analysis doesn't consider movies that haven't been released yet. So the movies evaluated are considered, for the purposes of a p-value test, to be a sample of size N from a hypothetical infinite population of movies with box office gross and Metacritic scores. The p-value test also assumes that there's something called a null hypothesis, which is that there is no relationship between Metacritic scores and box office gross. The null hypothesis is the hypothesis that all other hypothesises (is that the right plural? I don't know) are evaluated against.

What the p-value measures (in this case, it can be applied to other statistics as well) is the probability of seeing a correlation of that size or greater given a random sample of N observations if the null hypothesis is true. What's notable about this is that the p-value is not testing anything about any hypothesis other than the null hypothesis -- it can be used as evidence about the likelihood of the null hypothesis being true, but past that it has nothing to say about any OTHER hypothesis. Which is why it's wrong to say that a p-value test is evidence of a causal relationship -- it's not even trying to test that.

link

minimaxir 3810 days ago

Yes, this is correct.

I just realized I misread the GP's comment: thought it said "P-value" in general instead of "P-value of Pearson correlation," as a result I thought he was referring to a regression P-value.

link

andreasvc 3810 days ago

As far as I understand the p-value of a Pearson correlation is exactly the same as the p-value of a simple linear regression.

link