Hacker News new | ask | show | jobs
by ikura 1337 days ago
From the article:

   Personally, I think the dichotomy between hypothesis-testing and likelihood-quantification is a false one. The “P=0.05” cutoff we use to “reject” a hypothesis is an arbitrary one. When I read papers, I never “accept” or “reject” hypotheses but rather consider likelihood quantification as a measure of the weight of evidence or a distance of the data from some null hypothesis, as measured by some statistic. I encourage everyone else to consider this probabilistic worldview when viewing our paper: we aimed to quantify probabilities of this system occurring in nature, and P-values were convenient and commonly understood ways of communicating quantiles.

This paragraph does a lot of lifting. Conflating p-values and probabilities is the science equivalent of a code smell.
3 comments

Though p-values are probabilities.

They are the probability that the data seen (or more extreme) in the experiment were generated given the null-hypothesis is true.

Now, of course to fully understand the p, you also have to understand the null hypothesis. And yeah, sometimes it is misspecified. (by e.g testing out many null-hypotheses and only showing the more interesting ones, or accidentally creating a bad unlikely null-hypothesis which may allow for many uninteresting alternative hypotheses.)

Obligatory p-value snippet from the ASA:

   P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

Ronald L. Wasserstein & Nicole A. Lazar (2016) The ASA Statement on p-Values: Context, Process, and Purpose, The American Statistician, 70:2, 129-133, DOI: 10.1080/00031305.2016.1154108
To follow that up (so people know what they actually are), what p-values represent is the likelihood we would observe our data, given the null hypothesis.

Setting a cutoff of .05 is saying “if there’s less than a 5% chance we’d see this data, assuming the null hypothesis, then we can assume that the null hypothesis is false”

But this statement only applies to a 'naive' (or first) statistical analysis or test on the dataset. Once the researcher starts changing their assumptions in response to the results they're seeing, they're p-hacking and p-values are no longer meaningful. In addition, once you have multiple researchers looking at the same dataset with different assumptions, and you factor in publication bias, the p-value also loses meaning.
Well, yes and no. The p-value still means the same thing, but when you take a dataset and go looking for any result that is under a certain threshold, you’ll probably find it. “Unlikely” events happen all the time!

What your comment is highlighting is an issue with bad experimental design. (And, obviously, with our publication regime)

I'm becoming more and more convinced we need to multiply anything that is not strictly a probability (CIs, ML model scores, p-values) by 100.

"I have a confidence of 95" has very different ring to it than "I am 95% confident."

It would also prevent people from doing stupid things like using these values to compute expectations.

A p-value is strictly _a_ probability.
I'm an idiot.