| HN Mirror

That makes it clearer. I got thrown off by the claim that a standard algorithm will be able to de-bias if no de-biasing machinery has been built into it. BTW the machinery may be implicit in the choice of the model.

Simple toy example: say Y is a threshold function of X + high variance noise. I draw samples from this and scale down all y_i's that exceed the (unknown) threshold. In other words my corruption process is dependent on X. We can make it depend on Y too. These would require explicit modeling. Just throwing a uniformly rich class of P(X,Y) wont by itself fix this. We have to carve that space of P(X,Y) with the knowledge of possible corruption process to get a good model of the behavior before the corruption is applied.

BTW we have gone way off tangent, but that was a good conversation.