| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ncmncm 2365 days ago
	Why not? Science that insists on hypotheses written down beforehand is cargo-cult science. Observation is the first and most productive science. Double-blind experiments are to cement gains.

6 comments

MrEldritch 2365 days ago

basically, because once you start trying multiple hypotheses on the same dataset, the math used to determine "is this conclusion real, or am I just fooling myself" begins to break down.

The statistical significance threshold usually used is p<0.05, meaning that something is (generally, this is beginning to change since the replication crisis) considered to be a real discovery if it has less than a 1/20 chance of being a false positive under the chosen model.

As soon as you start trying multiple hypotheses, then that 1/20 chance of being a false positive begins to become meaningless. If you can just keep rolling d20s until one of them comes up with a critical hit, then you can easily generate false positives that still look very robust.

This is exactly the sort of bad science - p-hacking, fishing expeditions, and the garden of forking paths - that led to the replication crisis. (And that makes sense, as this paper is from 2013, and predates the widespread discovery of the crisis)

link

dodobirdlord 2365 days ago

The math continues to work out as long as you use the right approach. You have to collect twice as much data, and then set half of it aside at random without examining it. Then you can do whatever perverse p-hacking multi-modeling curve-fitting whatever to the half you kept until you reach a hypothesis, then check it against the half you set aside to recover the statistical significance you lost by using techniques that may have overfit the first half. Unsurprisingly, the math works out because this approach is isomorphic to collecting the first half, studying it to form a hypothesis, then conducting a proper pre-hypothesized experiment to collect the second half. Validation via holdout sets is the same approach used in machine learning and elsewhere to prevent models from overfitting data.

link

MrEldritch 2365 days ago

This is true! I was trying to simplify things a bit for a basic explanation, but I fear I oversimplified. I just meant that the generally used math breaks down; if you're aware of the problem, you can correct for it, but very often people don't.

link

ncmncm 2365 days ago

Stating it more plainly, what you wrote was incorrect, and unfairly tarred a statement that was, in fact, correct.

link

ORioN63 2365 days ago

Thanks! For someone that didn't understand why this was considered p-hacking, that made a whole lot of sense.

link

ncmncm 2365 days ago

p<0.05 is also cargo-cult science, and is much more responsible for the replication crisis -- along with biased sampling (pop. 18-22 yo US psych students).

It is also why we see repeated, spurious insistence that anti-depressants don't do anything.

Experiment design is a subtle skill.

link

stygiansonic 2365 days ago

You seem to be under the impression that a study like this gives a hard "yes/no" answer as to whether some hypothesis is true. That is not the case, nor is it ever the case with most studies like these. Instead, you need to do some sort of statistical hypothesis test.

As other comments have pointed out, once you start testing multiple hypothesis on the same dataset, you cannot apply the same significance threshold that you would if you had just begun with a single hypothesis before observing the data. Instead, you need to apply some sort of correction that takes into account the number of hypothesis being tested:

https://en.wikipedia.org/wiki/Family-wise_error_rate#Control...

link

yodon 2365 days ago

No. If you collect data and then hunt for "significant" results in it you are guaranteed to find spurious results. This is one of the most basic truths of statistics.

link

lonelappde 2365 days ago

You are confusing hypothesis generation with hypothesis testing. Both are science, but only one is a reliable way to determine truth.

link

throwawayhhakdl 2365 days ago

Probable claims. Not truth.

link

lupire 2365 days ago

In the non-Platonic real world, truth is claims that we believe have high probability.

link

BenoitEssiambre 2365 days ago

Not if you want to claim statistical significance. The math behind this method is based on defining the hypothesis before seeing the data (and even then it's usually very weak evidence of a tiny signal within the noise).

link

shanemhansen 2365 days ago

xkcd explains it better than I can. Basically if you pick p values that give 95% certainty 20 times you're probably going to "discover" at least one falsehood.

https://xkcd.com/882/

link