Hacker News new | ask | show | jobs
by sjg007 2951 days ago
I mean there are structural equation models that preceded Pearl's work that Pearl cites. And before that the Neyman-Rubin work.. Neyman first wrote about it in 1923. I think Pearl's principle insight was to use graph theory to reason about either Bayesian things (see probabilistic graphical models) or causal things (see causality). This is a fairly fundamental insight.
2 comments

Pearl's attention to the do conditionality -- i.e., P(Y|do(X)) versus P(Y|X) is interesting and important in a certain sense, but I'm not sure it's really resolved debates about causality in any practical sense.

I don't really mean that in a dismissive sense, just to point out that his notation just begs the question of what do(X) means, in terms of why it is actually important. To me it just kind of formalizes a certain notation and kicks the hard theoretical can down the road.

In the books and papers I've read of Pearl's, he makes reasonable logical arguments for certain types of causal inferences, but when, in discussion with colleagues, we've tried to think of how they would be implemented outside the context of an experiment, we've been sort of at a loss. I say this as someone who identifies with observational study professionally, but who recognizes the importance of experiments.

My broader point is that I think Pearl's do-calculus can be reexpressed in traditional graph theory/structural equations/statistics without introducing anything new. In that sense, although I think his writings have drawn attention to important issues, I don't think they have solved anything.

> just to point out that his notation just begs the question of what do(X) means,

It's very formally specified. The key object of study in Pearl-style causal inference is a structural causal model. A structural causal model is composed as equations like the following:

Y = f(X, Z, U)

Here, X and Z are observed inputs, other random variables in your system. U is unobserved. In other words, "Y is computed by a deterministic function which takes an unknown random input."

Then, P(Y = 1 | do(X=1, Z=2)) is defined as P(f(1,2, U) = 1).

My gripe isn't with the importance of what Pearl published-- of course it's important. I just mean the concept of conditioning on how the target or observational outcome varies when you intentionally vary some conditional variables, that concept for use in machine learning is not new at all. Causal models would just be one more take on it, with interconnections and differences and pros and cons compared with what came before. But it's always disingenuously framed like, "ML practitioners never knew about doing this, but it's the only way to truly go further with our models."