|
|
|
|
|
by nerdponx
2932 days ago
|
|
You take a random sample of 1,000 white men between the ages of 45 and 55, who have lived in New England for at least 10 years, with no known history of heart disease. Your randomly split them in half. You give half of them a supplement to take every day for 12 months, and you give the other half a placebo. If the number of heart attacks in the placebo sample is greater than in the treatment sample, you have some believable evidence that the supplement can help prevent unexpected heart attacks, at least in white men in their 40s and 50s. The idea is that you've controlled for just about every factor that could affect the rate of unexpected heart attacks, or those factors are evenly distributed throughout both samples because you were careful to sample randomly. Therefore, if there is a difference between the groups, on average, it must be because of the treatment that you introduced to one group and not the other. I'm hand-waving, of course, and I'm sure there are medical researchers out there who will read my study design and laugh at how badly controlled it is. But that should give you the general picture of one comon method used to perform "causal" analysis. |
|
Another technique we might use is a blocked (or stratified) random sample. Knowing that there will be both smokers and non-smokers, we recruit two separate samples, and randomize treatment assignment within each. This ensures that smoking status does not predict treatment assignment and guards against some potential threat from overall randomization.
We could also mitigate the imbalance that does exist by doing a matched analysis, where each treated unit is paired with a control unit that looks most like him (some control units are reused). Or we could match on propensity scores. Or we could weight on inverse propensity weights. Or we could weight using covariate balancing. Or...
My point in doing this info dump is to a) back up nerdponx's example, which is great and b) illustrate how there's a lot to learn about how statisticians have taken the problem of causal analysis seriously and developed techniques appropriate for answering causal questions.
People in the CS side of things tend to use Pearl's DAGS for conceptualizing this stuff. I'm in the stats/econ side of things so I use Neyman-Rubin. They're equivalent. Allow me to suggest Rubin and Imbens - Causal Inference for Statistics, Social and Biomedical Sciences as a good textbook that we assign to graduate students learning this stuff. Some of my students tell me the "Causal Inference Mixtape" is popular among people who want less statistical theory and more "what should I do as a practitioner". A virtue of both the resources I just mentioned is that they discuss not just experimental designs but also observational data studies, like the one the original post would have wanted to conduct.