Distinguishing Cause From Effect Using Observational Data

Y	Hacker News new \| ask \| show \| jobs

	Distinguishing Cause From Effect Using Observational Data (medium.com)
	41 points by cyang08 3642 days ago

5 comments

gwern 3642 days ago

https://arxiv.org/abs/1412.3773 https://medium.com/the-physics-arxiv-blog/cause-and-effect-t...

link

sctb 3642 days ago

Thanks, we updated the link from http://www.vocativ.com/335705/correlation-causation to this.

link

cschmidt 3641 days ago

There is a popular science book on this topic as well:

Why: A Guide to Finding and Using Causes

https://www.amazon.com/Why-Guide-Finding-Using-Causes-ebook/...

It is sitting on my bookshelf, but I haven't managed to get to reading it yet.

link

kem 3641 days ago

This has been discussed in the stats literature for awhile now. It's an interesting idea but makes lots of assumptions about the nature of noise versus signal. It could be really useful in some situations, but in others it would be totally useless, depending on how realistic the assumptions are in any given scenario.

link

throwwit 3641 days ago

Is the reasoning behind the method a corollary to compressed sensing?

link

LoSboccacc 3642 days ago

so p0.8, wasn't 0.95 the standard for claims once?

link

aab0 3641 days ago

No, because a classification accuracy is not a p-value. By construction, a random guesser would achieve 50% accuracy in guessing whether A~>B or A<~B for each pair of cause-and-effects in their dataset. So getting >50% accuracy is the goal here.

link

te 3641 days ago

Interestingly, the authors do acknowledge on p. 46 that their sample size is too small to obtain a statistically significant result:

A rough estimate how large the CauseEffectPairs benchmark should have been in order to obtain significant results can easily be made. Using a standard (conservative) Bon- ferroni correction, taking into account that we compared 37 methods, we would need about 120 (weighted) pairs for an accuracy of 65% to be considered significant (with two-sided testing and 5% significance threshold). This is about four times as much as the current number of 37 (weighted) pairs in the CauseEffectPairs benchmark. Therefore, we sug- gest that at this point, the highest priority regarding future work should be to obtain more validation data, rather than developing additional methods or optimizing computation time of existing methods. We hope that our publication of the CauseEffectPairs benchmark data inspires researchers to collaborate on this important task and we invite everybody to contribute pairs to the CauseEffectPairs benchmark data.

link

aab0 3641 days ago

If you want to distinguish a particular method, but you can definitely tell that overall, the methods are collectively outperforming chance and so in this dataset, it is possible to infer the direction of causation.

link