Hacker News new | ask | show | jobs
by shahbazac 972 days ago
I’ve tried to understand causal inference several times and failed. Tutorials seem unnecessarily long winded. I wish authors would give simple, to the point examples.

Say I have a simple table of outdoor temperatures and ice cream sales.

What can the machinery of causal inference do for me in this situation?

If it doesn’t apply here, what do I need to add to my dataset to make it appropriate for causal inference? More columns of data? Explicit assumptions?

If I can use causal inference, what can it tell me? If I think of it as a function CA(data), can it tell me if the relationship is actually causal? Can it tell me the direction of the relationship? If there were more columns, could it return a graph of causal relationships and their strength? Or do I need to provide that graph to this function?

I know a wet pavement can be caused by rain or spilled water or that an alarm can go off due to an earthquake or a burglary. I have common sense. I also understand the basics of graph traversal from comp sci classes.

How do I practically use causal inference?

To the authors of future articles on this (or any technical tutorial), please explain the essence, the easy path, then the caveats and corner cases. Only then will abstract philosophizing make sense.

7 comments

> Say I have a simple table of outdoor temperatures and ice cream sales. What can the machinery of causal inference do for me in this situation?

Not much. Causal inference works over networks of variables, specifically a DAG. But usually you know more than one variable association, so this is more an issue of pedagogy than the tool itself.

Probably the shortest, most persuasive example I can give you is a logical resolution to Simpson's Paradox: when the correlation between two variables can change depending on whether you consider a third variable or not.

The classic example is gender discrimination in college admissions. When looking at admissions rates across the entire university, women are less likely to be accepted than men. But when (in this example) you break that down into departments, every department favors women over men. This is a paradoxical contradiction, and worrying in that your science is only as good as the dimensions your data captures. Worse, the data offers no clean way to say which is the correct answer: the aggregate or the total. Statisticians stumbled for a long while on this, and it's kind of wild that we were able to declare smoking causes cancer without a resolution to this.

Pearl wrote a paper on how bayesian approaches resolve the paradox[1], but it does presume familiarity with terms like "colliders," "backdoor criterion" and "do-calculus." His main point is that causal inference techniques give us the language and tools to resolve the paradox that frequentist approaches do not.

[1]: https://ftp.cs.ucla.edu/pub/stat_ser/r414.pdf

When looking at admissions rates across the entire university, women are less likely to be accepted than men. But when (in this example) you break that down into departments, every department favors women over men.

If every department favored women then the entire university would also favor women. Parity is guaranteed in that scenario. What happened in the Berkeley case is that not every department favored women, and women applied disproportionately to the departments with lower admissions rates (including some that didn't favor them), while men did the opposite.

Yes, apologies, what I meant by "favored" was that in every department, women applicants were more likely to get an admission than men. But I'm pretty sure the admission rate can still be lower for women overall than men overall, using exactly the same scenario you described. If the sociology department admits 10 percent of applicants and the physics department admits 90, it seems very easy for gender bias in applications to shift women towards 10 and men towards 90, even if the rate is a few percent higher for women.
I get your point now. You're quite right that you can construct scenarios that arbitrarily favor men in the aggregate but women in specific departments, given the right ratio of applicants.
> Or do I need to provide that graph to this function?

You need to do that, and the math can help you measure how much each arrow contributes. The idea that you need to provide your model of the world is strangely not a key part of most introductions, but it’s crucial.

> outdoor temperatures and ice cream sales

That’s too simple: a simple regression can handle that. Causal inference can handle cases with three variables, assuming you provide an interaction graph. Say: your ice cream truck goes either to a fancy neighborhood or a working-class plaza. After observing the weather, you decide where to go, so know that wealth and weather influence sales, but sales can’t influence the other two. Assuming you have data all for cases (sunny/poor, sunny/rich, rainy/poor, rainy/rich), then you can separate the two effects.

> > outdoor temperatures and ice cream sales > That’s too simple: a simple regression can handle that.

Not quite. Regression by itself will not answer the causal (or equivalently, the counterfactual) question.

I strongly suspect you already know this and was elaborating on a related point. But just for the sake of exposition, let me add a few words for the HN audience at large.

Let me give an example. In an email corpus, mails that begin with "Honey sweetheart," will likely have a higher than baseline open rate. A regression on word features will latch on to that. However, if your regular employer starts leading with "Honey sweetheart" that will not increase the open rate of corporate communications.

Causal or counterfactual estimation is fundamentally about how a dependent variable responds to interventional changes in a causal variable. Regression and relatedly, conditional probabilities are about 'filtering' the population on some predicate.

An email corpus when filtered upon the opening phrase "Honey sweetheart" may have disproportionately high email open rates, but that does not mean that adding or adopting such a leading phrase will increase the open rate.

Similarly, regressing dark hair as a feature against skin cancer propensity will catch an anti-correlation effect. Dyeing blonde hair dark will not reduce melanoma propensity.

Your model needs to introduce a third piece of information: whether an email is a corporate communication—or a deliberate intervention.
My understanding (that might be out of date) is that the tools are weak. Ideally you would have tabular data and it would give you a digraph for the causal structure between variables. You can try this but the tools don't work reliably yet. Otherwise everyone would use them.
Agreed. Afaict, in practice, you setup your own casual graphs and test them. This seems very academic 1950s.

Interestingly, folks are finally doing more realistic experiments in the casual equiv of arch search, and genAI is giving these efforts a second wind. Still feels like at the toy stage or for academics & researchers with a lot of time on their hands, vs relevant for most data scientists.

I'm still on the sidelines, but keep checking in in case finally practical for our users..

Same here, I check in every year or so because it would be fantastic to have.
> Say I have a simple table of outdoor temperatures and ice cream sales.

You have more than that! You have knowledge about the world!

> What can the machinery of causal inference do for me in this situation?

Well, (I’m being purposefully pedantic here) you haven’t really asked a question yet. The first thing it can do is help you while you’re formulating one. It can answer questions like, “how can I anticipate how things I have and havent measured will the estimates I’m interested in/making?”

> If it doesn’t apply here, what do I need to add to my dataset to make it appropriate for causal inference? More columns of data? Explicit assumptions?

The first thing you need to do is articulate what you’re actually interested in. Then you need to be explicit about the causal relationships between things relevant to those questions. The big thing (to me) is that particular causal structures have testable conditional independence structures and by assessing these, you can build evidence for or against particular diagrams of the context.

Judea Pearl's The Book Of Why gives you more practical and easy to understand examples, I recommend that.
It's pretty simple. You cannot infer casualty from observational data. No matter how sophisticated your statistical tools are.

You need to perform a properly controlled experiments to infer casualty. And even then it's hard.

Inferring casualty from observational data is cargo cult science.

TL;DR: Causal inference is a complex topic, not a simple tool.

How's the ice cream example better than the sugary snacks example given in the article?

Here's the part about needing to add more columns to the data:

> When dealing with a causal question, it’s crucial to include variables known as confounders. These are variables that can influence both the treatment and the outcome. By including confounding variables, we can better isolate and estimate the true causal effect of the treatment. Failing to add or account for confounding variables may lead to incorrect estimates.

> How's the ice cream example better than the sugary snacks example given in the article?

Not the OP, but because that fails to explain how the basic hypothetical example works(!)

You want to know how much your sales would be in a parallel world where kids were stuck with bland snacks compared to your sweet treats. This is where causal inference steps in to provide the solution. (nice graph follows)

So how is that done?

> TL;DR: Causal inference is a complex topic, not a simple tool.

The simple version using graphical models and joint probabilities isn't difficult to explain or teach. The issue is that to do anything useful with it at scale you either need MCMC or variational inference and that's an entirely different bag of worms all together. For medical datasets you rarely have "scale", instead you have very few sample cases and a large expert model (the doctor/specialist).