Hacker News new | ask | show | jobs
by yummyfajitas 3549 days ago
So we both agree that if the bias is linear, and your model is linear, you capture it. Similarly if the model involves interaction (score x is_black), and you include linear interaction terms, you'll also capture it.

Now the question arises; what if things are more complex?

In real life they always are; both your biasing factor and the rest of the model. So we've cooked up all sorts of fun models like SVMs, random forests and neural networks to analyze such complicated models and find hidden features and relations that we didn't think of. Bias is one such feature.

If I built an algorithm that learned to display different ads to mobile and desktop people (i.e., treat mobile "time on site" differently from desktop "time on site"), would you be surprised by this?

1 comments

That makes it clearer. I got thrown off by the claim that a standard algorithm will be able to de-bias if no de-biasing machinery has been built into it. BTW the machinery may be implicit in the choice of the model.

Simple toy example: say Y is a threshold function of X + high variance noise. I draw samples from this and scale down all y_i's that exceed the (unknown) threshold. In other words my corruption process is dependent on X. We can make it depend on Y too. These would require explicit modeling. Just throwing a uniformly rich class of P(X,Y) wont by itself fix this. We have to carve that space of P(X,Y) with the knowledge of possible corruption process to get a good model of the behavior before the corruption is applied.

BTW we have gone way off tangent, but that was a good conversation.