Hacker News new | ask | show | jobs
by rjf72 2722 days ago
There's something that I think is relatively simple - design experiments not to try to prove your hypothesis, but to disprove it. I'm not talking about hypothesis vs null hypothesis here, but the experimental design itself from which the data is collected. There are lots of good examples, and this study happens to be a particularly good one. It basically looked at new topics posted on a forum for e.g. suicidals and compared them to new topics on a forum for e.g. pensioners. The study found a selection of 19 'absolutist' words occurred more frequently on the forums for one group than the other. It should be self evident that there are a practically infinite number of potential confounding variables there.

In some cases confounding variables are impossible to escape, and you simply have to accept the fact that the science is going to be dodgy at best. But this is not really one of those cases. There are trivial and practically free ways you could really try to test the hypothesis that depressed individuals use absolutes more often than non-depressed. For instance, why not give them a prompt and have them write a brief 300 word story? And even better you can secretly prompt the individuals in a given direction with what seems like a free-form prompt to try to further reduce confounding issues.

As an example, "Write a brief persuasive piece with the premise being that green is a more pleasant color than red." It seems open form, but it's not-so-secretly directing people in a broad but common direction to try to give you decent samples of speech where you continue to remove as many confounding variables as possible. Even better in my study design is that, similar to a twin study, it doesn't actually matter if your prompt would inherently nudge people towards using e.g. absolutes more often since you're comparing two individuals in the same 'environment.' What matters is not the absolute (har har) difference, but the relative difference. Suddenly you have an experimental situation where you're controlling for as much as you can outside of the behavior of the individuals themselves. And it would be an extremely cheap study that could even be done remotely.

---

From a reader's perspective and not a researcher's there are a million tell tale signs of p-hacking. The biggest one is studies, like this, that intentionally expose themselves to confounding variables. The average phrasal composition of new post topics on any non-general topic is going to radically differ between sites. Not controlling for that is not sloppy. It's far worse than sloppy since there is absolutely no way these researchers could not have been entirely aware of this confounding issue. It was an intentional choice and that deserves scrutiny. Given the current state of social sciences, I am no longer inclined to offer the benefit of the doubt.

Other tell tale signs tend to be large numbers of variables, particularly when they are overly specific. With a large enough set of data you can find some commonality between any group of people. So for instance take a set of e.g. rich individuals and a set of non-rich individuals. If you just start collecting random data that could in no possible way be causal you'll eventually find a subset that, for whatever reason, holds. People who were born on a Wednesday, went to a school with 5 letters in its first name, and have an 'E' in their last name are 92.3% more likely to be wealthy than those that don't! Of course the variables will never be so absurd which can make it sound like implying a possible causal relationship is not so absurd. Again taking this study they chose 19 specific words to be used as their selection of absolutes, down from an original choice of some 300. And their criteria, even in what they acknowledge, is something that deserves substantial scrutiny. The worst part is that in cases like this you're also left just trusting the author that at no point did their selection process involve 'peeking' at whether the words would 'prove' their hypothesis. And once again, I'm no longer inclined to offer the benefit of the doubt in studies of this sort.

And there are countless other signs. Another one, for instance, is seemingly odd exclusions/inclusions in the data. For instance throughout this post I've stated that the study only considered new topics on the forums. And that's true. They chose not to consider responses for no legitimate reason. They state it was done "in the interest of simplicity and interpretability" which not only makes absolutely no sense, but introduces yet another potential confounding variables. Responses and original topics are going to have starkly different word choices.

It's hard to generalize but maybe the easiest way is to remove good faith from the picture, in a way take it as your own little personal null hypothesis. Do the decisions and design taken within a study lend themselves towards (or away) from a connection with good faith - a study confident in its hypothesis and seeking to test it as stringently and rigorously as possible to try to ensure its integrity? Or do the decisions and design within a study seem to indicate individuals more interested in simply obtaining publishable data as is often a means to an end of survival in the current state of academia today? A study more geared towards softly 'prodding' a hypothesis in a way likely to yield something that can be published? In many cases the answer there is immediately evident.