| > Say I have a simple table of outdoor temperatures and ice cream sales. What can the machinery of causal inference do for me in this situation? Not much. Causal inference works over networks of variables, specifically a DAG. But usually you know more than one variable association, so this is more an issue of pedagogy than the tool itself. Probably the shortest, most persuasive example I can give you is a logical resolution to Simpson's Paradox: when the correlation between two variables can change depending on whether you consider a third variable or not. The classic example is gender discrimination in college admissions. When looking at admissions rates across the entire university, women are less likely to be accepted than men. But when (in this example) you break that down into departments, every department favors women over men. This is a paradoxical contradiction, and worrying in that your science is only as good as the dimensions your data captures. Worse, the data offers no clean way to say which is the correct answer: the aggregate or the total. Statisticians stumbled for a long while on this, and it's kind of wild that we were able to declare smoking causes cancer without a resolution to this. Pearl wrote a paper on how bayesian approaches resolve the paradox[1], but it does presume familiarity with terms like "colliders," "backdoor criterion" and "do-calculus." His main point is that causal inference techniques give us the language and tools to resolve the paradox that frequentist approaches do not. [1]: https://ftp.cs.ucla.edu/pub/stat_ser/r414.pdf |
If every department favored women then the entire university would also favor women. Parity is guaranteed in that scenario. What happened in the Berkeley case is that not every department favored women, and women applied disproportionately to the departments with lower admissions rates (including some that didn't favor them), while men did the opposite.