| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nilkn 2984 days ago
	While intended as light humor, this actually seems like a really damning argument to me. It's conceptually similar to overfitting a machine learning model by aggressively tuning hyperparameters without proper cross-validation, etc. What serious defenses are there after this sort of attack?

3 comments

prestonh 2983 days ago

Serious defense: p-values have been in use for a long time, and while they are error prone a larger number of true results has been found than false results (according to recent research the ratio is 2:1 reproducible to non-reproducible).

In the cartoon, the scientists are making multiple comparisons which is something strictly forbidden in frequentist hypothesis testing. One way to get around it is to apply a correction by dividing the significance theshold ("alpha") by the number of comparisons being made, in this case 20. The cartoon does not state it's actual p-value as most journals will require, but the hope would be that by dividing by the corrective factor the significance of that particular comparison goes away.

So p-value methods still lead to a lot of Type I and Type II errors, but in the past they have been the best science has been able to come up with. Actually, probably the greatest issue with false results in the scientific literature is that null results are not publishable. This leads to a case where 20 scientists might independently perform the same experiment where the null is true, for only one to find a significant result. The demand for positive results only acts as a filter where only Type I errors get made! This is just one problem with the publishing culture, and doesn't take into account researchers' bias to manipulate the data or experiment until p < .05.

An alternate approach to the frequentist methodology of using p-values is the Bayesian method, which has its own problems. First there are practical concerns such as choosing initial parameters that can affect your results despite sometimes being arbitrarily chosen, and also the high computational demand to calculate results (less of an issue in the 21st century, which is why the method is seeing a revival in the scientific community). Probably their main problem right now is that practitioners simply aren't familiar with how to employ Bayesian methods, so there's some cultural inertia preventing their immediate adoption.

link

nkurz 2983 days ago

while they are error prone a larger number of true results has been found than false results (according to recent research the ratio is 2:1 reproducible to non-reproducible)

It seems odd to talk about "results" as an average across all fields, rather than for a specific field. It's much more common for people to claim that psychology rather than physics has a reproducibility crisis, and thus I don't think it makes sense to talk about the combined reproducibility across both fields. What research are you referencing, and what fields did they look at? Given the differences across fields, if the average is 2:1 reproducible, I'd guess that some fields must be lower than 1:1.

link

prestonh 2983 days ago

You're right, it definitely depends on the field. The paper I am referencing looked at psychology, I believe. It is likely that a social science would have greater issues with reproducibility than a physical science.

link

da_chicken 2983 days ago

Oh, it's definitely damning. The real joke in the XKCD comic is that, if we assume each panel is a different study, the only study that would be published in a journal is the one where p < 0.05.

Originally it was intended that peer review in published journals and study reproduction would verify findings. In a small community where all results are treated equally, this works fine. In a world without data systems to organize data and documents, this was really the only reasonable method, too.

However, we don't live in that world anymore. The community isn't small, and information science and data processing are much advanced. Unfortunately, since careers are built on novel research, reproduction is discouraged. Since studies where the null hypothesis is not rejected are typically not published at all, it can be difficult to even know what research has been done. There are also a large enough number of journals that researchers can venue shop to some extent, as well.

Many researchers are abandoning classic statistical models entirely in favor of Bayes factors [https://en.wikipedia.org/wiki/Bayes_factor]. Others are calling for publishing more studies where the null hypothesis is not rejected (some journals specializing in this like [http://www.jasnh.com/] have been started). Others are calling for all data to be made available for all studies to everyone (open science data movement). Others are trying to find ways to make reproduction of studies increasingly important.

It's really a very complicated problem.

link

PurpleBoxDragon 2983 days ago

As you point out, there is already a major issue when dealing with honest scientists who have to work in a publish or perish model where funding is based on getting results. But if we were to tweak the parameters so that there are at least some biased scientists and that the finding sources are biased for certain results (other than just any result where p < 0.05), and we take into account a subset of society looking for 'scientific support' of their personal convictions, the issue becomes much worse.

Look at how much damage was done by science misleading people about nutrition in regards to carbs and fats. How often, especially from the social sciences, does some scientific finding get reported by popular media as some major finding which should have drastic effects on social/legal policy, only for the root experiment to be a single study with a p < 0.05 where the authors caution against drawing any conclusions other than 'more research is needed'? Violence and media is a good example, and even more so when we consider the more prurient variants thereof.

I think this is the basis of why I am more willing to trust new research in physics more than in sociology.

link

Fomite 2984 days ago

Effect estimation, rather than relying on p-values, is one approach that provides far more context than just "Is or is not significant".

Also training your scientists - especially those outside the physical sciences - that effects likely aren't fixed in a meaningful sense (i.e. the effect of smoking on lung cancer isn't a universal constant in the way the speed of light is), at which point multiple estimates of the same effect from different groups and populations has value.

link