|
|
|
|
|
by aab0
3642 days ago
|
|
No, because a classification accuracy is not a p-value. By construction, a random guesser would achieve 50% accuracy in guessing whether A~>B or A<~B for each pair of cause-and-effects in their dataset. So getting >50% accuracy is the goal here. |
|
A rough estimate how large the CauseEffectPairs benchmark should have been in order to obtain significant results can easily be made. Using a standard (conservative) Bon- ferroni correction, taking into account that we compared 37 methods, we would need about 120 (weighted) pairs for an accuracy of 65% to be considered significant (with two-sided testing and 5% significance threshold). This is about four times as much as the current number of 37 (weighted) pairs in the CauseEffectPairs benchmark. Therefore, we sug- gest that at this point, the highest priority regarding future work should be to obtain more validation data, rather than developing additional methods or optimizing computation time of existing methods. We hope that our publication of the CauseEffectPairs benchmark data inspires researchers to collaborate on this important task and we invite everybody to contribute pairs to the CauseEffectPairs benchmark data.