Hacker News new | ask | show | jobs
by kristjansson 1694 days ago
It's worth pulling the principles from the ASA's statement [2] as well:

  1. P-values can indicate how incompatible the data are with a specified statistical model.
  
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  
  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
  
  4. Proper inference requires full reporting and transparency
  
  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
  
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
The basic criticism one of brittleness - that unless very carefully planned, executed, and interpreted, p-values from hypothesis does not support the claims some would like to be on their results, and that meeting the first condition is so difficult that the technique should not be recommended. One _should_ look for 'significant' results, but using measures that align better with colloquial understandings of significance i.e. with how users are misinterpreting p-values now.
1 comments

> P-values do not measure the probability that the studied hypothesis is true

So, what’s the best way to measure the probability that the studied hypothesis is true?

The argument against p-values is part of the argument against any bright-line single-number rule for identifying truth. The job of the researcher is to demonstrate (at least) that

1. there is (isn't) an observable difference between groups of interest

2. the difference is (not) attributable to hypothesized causal mechanism i.e. the (absence of a) difference isn't due to random variation in the observed sample i.e. the difference would be observed by a independent replication of the same analysis/study

3a. the difference is not explainable by other factors that vary between the groups, observed or unobserved

3b. the difference is not artificially inflated (suppressed) by the statistical choices

4. the difference is large enough to be practically relevant.

and so on

If the degree of certainty of statements about the difference can be characterized by a single number at the end of the process, great! But the goal should be a convincing, wholistic story, not the single number.

I share your concern, and I worry that we'll find this battle continues 20 years from now.

There are many possible things that can go wrong with a P-value and I'm not a statistician, but things I look for in data are the structure/distribution of the "noise" and any correlations seen within the "noise" and the "signal". That helps you build a signal and noise model. Assuming that all your noise is inherently uncorrelated gaussian, is a pretty strong model assumption.

Use Bayesian statistics [1] :-)

[1] https://en.wikipedia.org/wiki/Bayes_factor