Hacker News new | ask | show | jobs
by mcguire 2671 days ago
I'd like to say that the author has been reading The Book of Why, but it seems that he hasn't because he missed the punch line of the section on the paradox: you need a causal model to separate the two branches of the paradox. It's as easy to construct examples where the overall view is correct as it is so construct examples where the separate views are.
2 comments

I'm unclear: what was the incorrect claim you're saying the author made?
The parent is not saying the author made an incorrect claim. They are saying that the parent did not continue their argument to arrive at a conclusion that someone else had, the conclusion that causal models are what tells you when you can combine datasets and when you can't.
> causal models are what tells you when you can combine datasets and when you can't.

but then the causal model is subjective right? What if there are two different causal models, and a priori cannot be known which is the "true" one?

Can the selection of the causal model be used to justify the dataset, in order to push a particular agenda?

Your job when analysing data is simply to enumerate the possibilities and assign likelihoods to them if possible. If two models fit equally well, you're supposed to write them both down in the hope that someone will collect further data to distinguish between them.

If you're cutting holes in your report for political reasons, that's just not doing the job. That's what pundits are paid to do, not (ideally at least) scientists. Fraud is easy to commit, and the fact that it's possible is not that hard of a philosophical issue.

How do you tell that a paper containing conclusions to support an agenda is written with correct scientific rigor, rather than fraud? Using Simpson's paradox, one can obfuscate their biases by making the desired conclusion drop out of the data.
Simpson's paradox is about a data conflict between an overall view and a more specific view. For example, in the kidney stone scenario, say you find treatment A is more successful overall and treatment B is more successful in the specific view at both treating small stones and treating big stones when broken down that way. The article indicates that the specific view is always correct so treatment B should be used in the future, whereas the commenter is saying that context is important to determine which treatment should be used.
Exactly. With a causal model (which can be validated independently) you have a principled reason for choosing which variables to control for.
Post author here. Can confirm I’ve not read The Book of Why!

I’ll add it to my reading list.

A warning: it's seriously self-congratulatory. But I don't know of anything better.