Hacker News new | ask | show | jobs
by BugsJustFindMe 488 days ago
> Now, a lot of these studies try to "control for" the problem I just stated - they say things like "We examined the effect of X and Y, while controlling for Z [e.g., how wealthy or educated the people/countries/whatever are]." How do they do this? The short answer is, well, hm, jeez.

You mean they don't cluster the data into sets of overlapping bins where the controlled attribute has approximately the same value and then look for the presence of an XY relationship within the bins instead of across them?

1 comments

No. What they actually do is that they do a regression with both X and Z among the independent variables, and then look solely at the coefficients coming from X. (As mentioned in the article.) Including Z as a dependent variable alongside X "controls for" it in that now the coefficients for X are supposed to not include any effect from Z (since any Z effect should go in the Z coefficients). How well this works is something I don't know enough to answer.

I don't actually know how the method you suggest compares in the limit of finer bins. It's possible it might only achieve similar results?

The smaller bins approach is adjustment via stratification.

Good primer on both here: https://www.mynutritionscience.com/p/statistical-adjustment

My understanding is that in the limit, it does the same thing, but with more of a flattened tree representation.