Hacker News new | ask | show | jobs
by yummyfajitas 3551 days ago
Here's my point.

(1) is only possible if your data provides access to the biasing variable, perhaps via redundant encoding. This is the standard critique folks make.

As per (1), the biasing variable is available. Now if the algorithm is expressive enough to describe the functional form of the bias (e.g. the bias is quadratic, and the model includes quadratic terms), it will fix that bias.

You're right that there are lots of hidden variables that we can't use in a predictor. Murderous intent and mafia membership are also not available as predictive factors. You could build a more accurate model if you had that data. So what?

1 comments

The problem with (2) isn't just that your model isn't as precise as it could be, it's that your model may be inadvertently biased because all of the data that it was fed was biased. This comment (https://news.ycombinator.com/item?id=12625917) gives a good example of that one. No amount of expressivity in the algorithm will account for the fact that the Friendface model (read the comment) was trained on a predominately white userbase versus FaceSpace's model which is trained on a predominately urban black userbase.