Hacker News new | ask | show | jobs
by jermaink 3807 days ago
In the evaluation of correlations, it can always be informative to know the confidence interval for r, with all caution towards p-value interpretation.

Surely, correlation provides information on association rather than cause and effect (causation should rather be modeled with Granger and other regression models). Sample sizes and variances will certainly contribute to different p-value outcomes. This is because p-values reward low variance more than the magnitude of impact (Type I/II error etc.). If you have p-values, better report them and add a footnote on how to interpret them.

1 comments

> However, the significance or p-value reflects the probability that the correlation does not imply a causal relation.

Technically, in this case, a significance test would answer the question "is the Pearson correlation statistically significant from 0?" In this case, we would expect it to fail since it clearly isn't, and is therefore the test is less helpful/important. (even if it passed, the conclusion would be "correlations are low in magnitude and therefore do not matter" as noted in the post anyways)

Finding the exact P-value of a Pearson correlation requires setting up bootstrapping, which is not something I have handy at the time but will work on in future posts.

Again, I'm not looking at R^2 and the P-value of a linear regression, which is different.

It's just a recommendation to improve the reporting, no general defense of p-values. Pearson does not imply to analyze a causal relationship. I see the point it's not linear (then you would have had a fitted linear reg, I assume) but still can tell you that missing p-values may cause arching eyebrows :)

In small sample sizes, correlation can easily be significant, often at the cost of low confidence. To the opposite, in large sample sizes, the magnitude of the effect may be lower but at higher confidence. In both cases, results have to be interpreted with caution. The recent p-value debate points towards a lot of issues here. For instance, there have been medical studies overestimating correlations in small sample sizes while other authors seemed to underestimate their long-term large-sample results with correlations in the ballpark of 0.15 (p<0.05).