| I 100% agree with this blind spot. Most data science coursework avoids the very thing making it a science: the explanation of what change causes what effect. I've been surprised that year after year, programs at so many "Schools of Data Science" keep gliding over this area, perhaps alluding to it in an early stats course if at all. It's an important part of validating that your data-driven output or decision is actually creating the change you hope for. So many fields either do poor experimentation or none at all, others are prevented from doing the usual "full unrestricted RCT": med and fin svcs and other regulated industries have legal constraints on what they can experiment with; in other cases, data privacy restricts the measures one can take. I've had many data folks throw up their hands if they can't do a full RCT, and instead look to pre-post with lots of methodological errors. You can guess how many of those projects end up. (No, not every change needs a full test, and some things are easy rollback. But think of how many others would have benefitted from some uncertainty reduction.) Sure, "LLM everything" and "just gbm it!" and "ok, just need a new feature table and I'm done!" are all important and fun parts of a data science day. But if I can't show that a data driven decision or output makes things better, then it's just noise. Causal modeling gets us there. It improves the impact of ml models that recognize the power of causal interventions, and it gives us evidence that we are helping (or harming). It's (IMO) necessary, but of course, not sufficient. Lots of other great things are done by ML eng and data scientists and data eng and the rest, having nothing to do with casual inference... But I keep thinking how much better things get when we apply a causal lens to our work. (And next on my list would be having more data folks understanding slowly changing dimension tables, but this can wait for another time). |
Biologists, if not data scientists, are used to considering indirect evidence for causality. It's why we sometimes accept studies performed in other organisms as evidence for biology in humans; it's why we sometimes accept research performed on post-mortem human tissue as being representative of the biology of living humans; to name but a few examples. A big part of a compelling high-impact biology (or bioinformatics) paper is often the innovative ways that one comes up to show causality when a direct RCT is not feasible, and papers are frequently rejected because they don't to the follow-up experiments required to show causality.