| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AnthonyMouse 2445 days ago

> What I don't get is how the creation of the "risky behavior" node can be assumed to have a completely uniform distribution of ethnicities inside of it.

It's a much broader problem than that, because the direction of causation can be extraordinarily difficult to establish in general.

Changing the color of your car shouldn't change your ethnicity, but what if it does? Suppose you're white with Spanish ancestry and Hispanics are the group who like red cars. Paint your car red and some red-car-preferring Hispanics may be more inclined to associate with you and thereby cause you to be more immersed in Hispanic culture and start to identify as Hispanic rather than white.

And that's a silly one just to show that even the exemplar could be wrong. More plausibly, what if the causation between "risky behavior" and "red car" is reversed? We know that colors can affect human behavior. If getting into a red car makes you drive more aggressively then you have a direct causal chain between being more likely to buy a red car (for any reason) and being more likely to drive aggressively and get into a car crash.

That means that in order to use this you would first need to prove the direction of causation between the two behaviors. But that's a tall hill to climb when one of the factors you're trying to prove causation with is the one you don't have good data on.

There is also a straight forward way to tell when a method like this is definitely getting the math wrong -- does it make the prediction rate for that class of people worse? If your assumptions are correct then it shouldn't, so if it does then you've unambiguously failed.