Hacker News new | ask | show | jobs
by Regardsyjc 2722 days ago
Do you have any recommended books or resources by Taleb or another author on how to identify or avoid methodological flaws like p-hacking, understanding correlation vs causation, and more?

I'm currently taking a data science course on Udemy and learning about chi squares, regression, and decision trees, but I'd love more information on best practices especially for experimental design.

5 comments

Taleb's book The Black Swan has sections comparing heavy-tailed prob. distributions with thin-tailed prob. distributions. The thesis is that you can't tell whether a distribution has fat tails until a fat-tailed event happens (which he calls a black swan). Also goes into the flaws of using induction, the flaws in ascribing causes when there are "silent witnesses", and so on. I don't think there's any specific practical advice. It's more "If you work in economics and other areas, your research may well be doomed." Taleb's advice is often "Don't do this", instead of "Do this".

I'm reading Anti-Fragile now. Can't off the top of my head give you advice on experiment design or statistics. [EDIT] It's more about designing systems that benefit from the unpredictability of the world, instead of building systems that are harmed by unpredictability. [EDIT] It's an important companion to his previous book, because it gives positive advice on how to make decisions in a world that isn't amenable to understanding because of complexity. It effectively gives positive advice, instead of just negative.

There's also SITG that I'd like to read. And there's a "Technical Incerto" which looks like a work-in-progress, but involves concrete statistics.

[edit] He's also tweeting the contents of a new Data Science course he's teaching at NYU. Be warned that Taleb is a bit of an arsehole on Twitter.

> identify or avoid methodological flaws like p-hacking,

It is not trivial to identify them. If it was then there would be no replication crisis. Avoiding is easy in theory: you need decide on math methods in advance. Generate random data before you start to gather real data, and write an R program to process this data. Test it, debug, and when you get real data just feed it to this program. Without changing the program. It is harder in practice though.

> understanding correlation vs causation

> I'd love more information on best practices especially for experimental design

I learnt it with experimental psychology. ะก. James Goodwin "Research in Psychology"[1], there are some specific psychological topics covered (you might not be interested in ethics of psychological research), there are not a word about chi squares or other math methods you mentioned (data processing is out of the scope of the book), but there are a lot about different experimental setups, with a lot of examples. IIRC there is discussion of p-hacking too.

I believe this book is a good read to anyone interested in design of experimental and/or correlational research methods in general, not just for psychologists.

[1] https://www.amazon.com/Research-Psychology-Methods-Design-8E...

There's something that I think is relatively simple - design experiments not to try to prove your hypothesis, but to disprove it. I'm not talking about hypothesis vs null hypothesis here, but the experimental design itself from which the data is collected. There are lots of good examples, and this study happens to be a particularly good one. It basically looked at new topics posted on a forum for e.g. suicidals and compared them to new topics on a forum for e.g. pensioners. The study found a selection of 19 'absolutist' words occurred more frequently on the forums for one group than the other. It should be self evident that there are a practically infinite number of potential confounding variables there.

In some cases confounding variables are impossible to escape, and you simply have to accept the fact that the science is going to be dodgy at best. But this is not really one of those cases. There are trivial and practically free ways you could really try to test the hypothesis that depressed individuals use absolutes more often than non-depressed. For instance, why not give them a prompt and have them write a brief 300 word story? And even better you can secretly prompt the individuals in a given direction with what seems like a free-form prompt to try to further reduce confounding issues.

As an example, "Write a brief persuasive piece with the premise being that green is a more pleasant color than red." It seems open form, but it's not-so-secretly directing people in a broad but common direction to try to give you decent samples of speech where you continue to remove as many confounding variables as possible. Even better in my study design is that, similar to a twin study, it doesn't actually matter if your prompt would inherently nudge people towards using e.g. absolutes more often since you're comparing two individuals in the same 'environment.' What matters is not the absolute (har har) difference, but the relative difference. Suddenly you have an experimental situation where you're controlling for as much as you can outside of the behavior of the individuals themselves. And it would be an extremely cheap study that could even be done remotely.

---

From a reader's perspective and not a researcher's there are a million tell tale signs of p-hacking. The biggest one is studies, like this, that intentionally expose themselves to confounding variables. The average phrasal composition of new post topics on any non-general topic is going to radically differ between sites. Not controlling for that is not sloppy. It's far worse than sloppy since there is absolutely no way these researchers could not have been entirely aware of this confounding issue. It was an intentional choice and that deserves scrutiny. Given the current state of social sciences, I am no longer inclined to offer the benefit of the doubt.

Other tell tale signs tend to be large numbers of variables, particularly when they are overly specific. With a large enough set of data you can find some commonality between any group of people. So for instance take a set of e.g. rich individuals and a set of non-rich individuals. If you just start collecting random data that could in no possible way be causal you'll eventually find a subset that, for whatever reason, holds. People who were born on a Wednesday, went to a school with 5 letters in its first name, and have an 'E' in their last name are 92.3% more likely to be wealthy than those that don't! Of course the variables will never be so absurd which can make it sound like implying a possible causal relationship is not so absurd. Again taking this study they chose 19 specific words to be used as their selection of absolutes, down from an original choice of some 300. And their criteria, even in what they acknowledge, is something that deserves substantial scrutiny. The worst part is that in cases like this you're also left just trusting the author that at no point did their selection process involve 'peeking' at whether the words would 'prove' their hypothesis. And once again, I'm no longer inclined to offer the benefit of the doubt in studies of this sort.

And there are countless other signs. Another one, for instance, is seemingly odd exclusions/inclusions in the data. For instance throughout this post I've stated that the study only considered new topics on the forums. And that's true. They chose not to consider responses for no legitimate reason. They state it was done "in the interest of simplicity and interpretability" which not only makes absolutely no sense, but introduces yet another potential confounding variables. Responses and original topics are going to have starkly different word choices.

It's hard to generalize but maybe the easiest way is to remove good faith from the picture, in a way take it as your own little personal null hypothesis. Do the decisions and design taken within a study lend themselves towards (or away) from a connection with good faith - a study confident in its hypothesis and seeking to test it as stringently and rigorously as possible to try to ensure its integrity? Or do the decisions and design within a study seem to indicate individuals more interested in simply obtaining publishable data as is often a means to an end of survival in the current state of academia today? A study more geared towards softly 'prodding' a hypothesis in a way likely to yield something that can be published? In many cases the answer there is immediately evident.