|
|
|
|
|
by daenz
2437 days ago
|
|
I'm not sure I follow their car crash diagram and explanation. They've laid out that one ethnicity might prefer red cars more than others, and drivers of red cars tend to get into more crashes, and that training ML with "red cars" as a feature would lead to a bias against that ethnicity. I got that part. What I don't get is how the creation of the "risky behavior" node can be assumed to have a completely uniform distribution of ethnicities inside of it. The author has no problem saying that an ethnicity can have one causal behavior (purchasing red cars) but not another (being riskier drivers). This seems logically inconsistent. |
|
If members of my nation get drunk more often than some other, while it's offensive to say I am a 34% drunkard, on average it might hold; instead of forbidding this type of inference I'd rather rely on more signals to figure out what kind of person I am specifically for individualized decisions. They bypass this problem by adding "risky behavior" not contained in the input dataset so they just decide to model it as a hidden variable of Bayesian inference, where "risky behavior" might be correlated with ethnicity and red car anyway, just not visible outside. So if my nation is 34% drunkard but neighboring is only 11%, the conditional probability will likely be higher for my nation anyway, but obfuscated by the use of Bayesian hidden state. I am not sure why would that improve fairness.