Hacker News new | ask | show | jobs
by whoisnnamdi 2499 days ago
Cool stuff, thanks for sharing publicly.

Did you all consider using Double Selection [1] or Double Machine Learning [2]?

The reason I ask is that your approach is very reminiscent of a Lasso style regression where you first run lasso for feature selection then re-run a normal OLS with only those controls included (Post-Lasso). This is somewhat problematic because Lasso has a tendency to drop too many controls if they are too correlated with one another, introducing omitted variable bias. Compounding the issue, some of those variables may be correlated with the treatment variable, which increases the chance they will be dropped.

The solution proposed is to run two separates Lasso regressions, one with the original dependent variable and another with the treatment variable as the dependent variable, recovering two sets of potential controls, and then using the union of those sets as the final set of controls. This is explained in simple language at [3].

Now, you all are using PCA, not Lasso, so I don't know if these concerns apply or not. My sense is that you still may be omitting variables if the right variables are not included at the start, which is not a problem that any particular methodology can completely avoid. Would love to hear your thoughts.

Also, you don't show any examples or performance testing of your method. An example would be demonstrating in a situation where you "know" (via A/B test perhaps) what the "true" causal effect is that your method is able to recover a similar point estimate. As presented, how do we / you know that this is generating reasonable results?

[1] http://home.uchicago.edu/ourminsky/Variable_Selection.pdf [2] https://arxiv.org/abs/1608.00060 [3] https://medium.com/teconomics-blog/using-ml-to-resolve-exper...

1 comments

Thanks! Yes, the concerns you mentioned would also apply to PCA. What we've actually done to help alleviate this is a union of components from y-aware[1] and normal PCA to capture variables that are correlated to both the dependent variables and (hopefully) most of the treatment variables. This seems similar to the double selection approach you mention - the difference being that since we are trying to run this at scale for 1000s of treatment variables, running a feature selection with each of the 1000 treatment variables as the dependent variable isn't super feasible, so the normal PCA acts as proxy for this part of the double selection.

Regardless, we're never going to completely remove omitted variable bias, as we're never going to capture 100% of relevant variables. One way we monitor our model's bias is by looking at the error distribution between users in the treatment vs control. If these aren't similar, there's too much bias in our estimate of the treatment effect, so we wouldn't want to serve an estimate of the treatment effect for this variable to our customers.

The current product is in beta and we're working with some of our current customers to try to re-create our results with A/B tests. I'm hoping that by our GA release in the fall we'll have some case studies with specific examples!

[1] http://www.win-vector.com/blog/2016/05/pcr_part2_yaware/