I’m wondering how often p-values are even used in papers about archeological digs? It seems like historical arguments are often made without doing statistics at all?
Yes, but I’m annoyed with the low-effort use of science-based metaphor, and taking it more seriously leaves an opening for someone who actually knows something to elaborate.
In statistics, the p-value is shorthand for “how unlikely was this result.” Smaller p-values indicate less likely results, which in turn creates evidence of a relationship between variables. Many naive approaches to statically analysis place an almost magical value on the 5% threshold, but that’s not actually a rare event if you run dozens of tests. P-hacking generally refers to running tests and discarding the values that do not support what you want to be true. It’s a big problem in academia.
Technically it’s how unlikely of an event under a null model.
The null model is key: if you are mischievous, you can just define a seemingly benign but incorrect null model and generate extreme p values without discarding: all values will be significant!