|
|
|
|
|
by jdmichal
3339 days ago
|
|
Obviously the data set doesn't become "weaker" or "lose value" -- it's data, and running stats against it doesn't change it. However, every test for a correlation against a data set has some chance of yielding a false positive or false negative. This chance is called the p-value, and typically .05, or 5%, is the minimum requirement to be considered "significant". But that means that if you test for 20 or so correlations, you would expect one of them to be wrong. And the only thing that can fix that is reproducing the test with a different data set. Searching for "science reproduction crisis" will give a lot of good results for further reading. This topic is also what this XKCD is about -- and it's not a coincidence that there are 20 "test" frames with a .05 p-value: https://www.xkcd.com/882/ |
|
A p-value of 5% means that, IF the null hypothesis is true (IF!), then there's a 5% chance of getting results as extreme as measured.
A p-value of 5% does not mean than you should expect a rate of 5% false positives & negatives.