Hacker News new | ask | show | jobs
What If Algorithms Could Be Fair? (humanreadablemag.com)
19 points by pekalicious 2437 days ago
5 comments

I'm not sure I follow their car crash diagram and explanation. They've laid out that one ethnicity might prefer red cars more than others, and drivers of red cars tend to get into more crashes, and that training ML with "red cars" as a feature would lead to a bias against that ethnicity. I got that part. What I don't get is how the creation of the "risky behavior" node can be assumed to have a completely uniform distribution of ethnicities inside of it. The author has no problem saying that an ethnicity can have one causal behavior (purchasing red cars) but not another (being riskier drivers). This seems logically inconsistent.
There is a strong push for "fairness", see e.g. "Toronto Declaration". I think all it would do is completely halt progress of AI and install bureaucracy to the lowest decision levels, paralyzing whole ML research. Nobody seems to think that we are in a clash of different cultures with different sensitivities and there is no single common platform for stating what is "fair". I am worried the loudest voice would set the trend and we will have some insanity enforced all the way down. There are even calls to ban "blackbox" ML, basically allowing only trivial parts in any kind of decision making.

If members of my nation get drunk more often than some other, while it's offensive to say I am a 34% drunkard, on average it might hold; instead of forbidding this type of inference I'd rather rely on more signals to figure out what kind of person I am specifically for individualized decisions. They bypass this problem by adding "risky behavior" not contained in the input dataset so they just decide to model it as a hidden variable of Bayesian inference, where "risky behavior" might be correlated with ethnicity and red car anyway, just not visible outside. So if my nation is 34% drunkard but neighboring is only 11%, the conditional probability will likely be higher for my nation anyway, but obfuscated by the use of Bayesian hidden state. I am not sure why would that improve fairness.

> There is a strong push for "fairness", see e.g. "Toronto Declaration". I think all it would do is completely halt progress of AI and install bureaucracy to the lowest decision levels, paralyzing whole ML research.

It would only paralyze those who paid attention to the Toronto Declaration. You’re right because you can’t make ML fair because the universe isn’t fair, that’s a property of human judgements about facts. The facts remain the same regardless of ones feelings.

https://www.chrisstucchio.com/pubs/slides/crunchconf_2018/sl...

AI Ethics, Impossibility Theorems and Tradeoffs

Except any 2 humans don't have matching ideas about what's fair, which means that they're both unfair from eachother's perspective.

Humans are in reality much less fair than algorithms.

> there is no single common platform for stating what is "fair".

This is the crux of the issue and as always, most people seem to miss it. Often “fair” is used as shorthand for “does what I think is right”.

"forbidding this type of inference"

Isn't this just a misleading way to say "holding a certain causal belief"? Why exactly would that be a bad thing? If you reject one set of causal beliefs, you necessarily hold a different set.

Some beliefs are correlated with reality, others don't. If GP's assertion about 34% more drinking on average is true, then rejecting it isn't "holding a different set of beliefs", it's just being wrong.

If there's an issue worth pursuing here, it's educating people to stop using average population statistics to rate individuals from populations. Usually the variance within a population makes population-level statistics useless for evaluating individuals.

Rejecting the causal relationship is not the same as rejecting the correlation, right? Why can't (or shouldn't) one separate the two?
You're right in principle, but the point here is about the reasons for rejecting a casual model. The issue people seeking fairness in statistics run into is rejecting models based on what ought to be, instead of what is. A casual model can be totally unfair, and yet also correct (insofar an approximation is considered correct).

Taking the example from our parallel discussion, if the data says being male is correlated with risky driving, and it seems to fit the casual model of "male -> risky", it would be wrong to reject it just on the grounds of "we're using this model to set insurance rates, so by penalizing males, the model is sexist". It may be that you can come up with a better casual model explaining the correlation - say, cultural history and path dependence - but until you can, rejecting a fitting model based on "it's unfair, reality ought not to be so" is just wrong.

> What I don't get is how the creation of the "risky behavior" node can be assumed to have a completely uniform distribution of ethnicities inside of it.

It's a much broader problem than that, because the direction of causation can be extraordinarily difficult to establish in general.

Changing the color of your car shouldn't change your ethnicity, but what if it does? Suppose you're white with Spanish ancestry and Hispanics are the group who like red cars. Paint your car red and some red-car-preferring Hispanics may be more inclined to associate with you and thereby cause you to be more immersed in Hispanic culture and start to identify as Hispanic rather than white.

And that's a silly one just to show that even the exemplar could be wrong. More plausibly, what if the causation between "risky behavior" and "red car" is reversed? We know that colors can affect human behavior. If getting into a red car makes you drive more aggressively then you have a direct causal chain between being more likely to buy a red car (for any reason) and being more likely to drive aggressively and get into a car crash.

That means that in order to use this you would first need to prove the direction of causation between the two behaviors. But that's a tall hill to climb when one of the factors you're trying to prove causation with is the one you don't have good data on.

There is also a straight forward way to tell when a method like this is definitely getting the math wrong -- does it make the prediction rate for that class of people worse? If your assumptions are correct then it shouldn't, so if it does then you've unambiguously failed.

Right, it seems very plausible that car culture differs in different cultures. Is it truly unreasonable to suggest that perhaps more than an average number of Italians are fast aggressive drivers? From what I've heard and seen, it's anecdotally true. I wouldn't rule out the possibility of it being a statistically true.

And every time I express my desire for autobahns without speed restrictions to crisscross North America, whoever I'm talking to has generally been quick to inform me that Germans can have nice things like that because they are careful/skilled drivers, while Americans are reckless (wreckful) drivers and cannot be trusted at high speeds.

If more than an average number of Italians are fast drivers, it doesn't mean being Italian causes being a fast driver. Is the idea that correlation is not causation in this context really breaking everybody's brain?

Now you may argue that correlation reflects causation in a particular case, sure, but in general, it is not the same, so it seems perfectly logical to me to point out that you can start building your model with certain causal assumptions and without others, without in any way disregarding your statistics.

Is it so hard for you to believe there might plausibly be a causation?

Consider the case of African Americans who are discriminated against by traffic cops. Is it plausible that African Americans, in an attempt [perhaps in vain] to minimize interaction with traffic cops, are more cognizant of traffic laws and drive more conservatively than the average American? I don't know if the data supports that hypothetical, but it seems plausible to me.

Assuming that this were the case, if you were to assume that African Americans drove as well as white Americans, you would be discriminating against the African American population by failing to recognize their safer driving habits.

Whether you or I think there is a causation in specific cases is irrelevant, as is whether we apply charged terms like "racism" to certain causal linkages.

The point is that one is not compelled to believe in the causal link just because there is a statistical link.

So if certain causal links are politically contentious, rejecting them due to "political correctness" is completely separate from rejecting the facts, the statistics that are collected. It is political, but not in opposition to reality.

The article, as I understood it, is puncturing the assertion of objectivity by those who implicitly assert we have to regard all correlations as equally causal or else be against reason and logic.

I certainly do not believe anybody should be compelled to assume a correlation is a causation. However I also do not think one should preemptively rule out the possibility of causation. Without examining the nitty gritty details of any particular situation, we can't know which is the case. We certainly cannot assume one and rule out the other, which I fear is what you assumed I was doing.
The elephant in the room is that the real way to tell whether that is the case would be to use race as a factor the same as age or sex. If African Americans are more careful drivers then that would detect it and take it into account.

But then you have to take the bad with the good. If it turns out that strict adherence to traffic laws that nobody else abides is actually more dangerous than following the normal flow of traffic, it would also detect that and take it into account.

Well it may be the case that they accidentally have a proxy for race already in their data (the "this ethnicity prefers red cars" hypothetical in the article above.) So race may already in practice be factored in despite nobody intending for that to be the case (assuming nobody anticipated that a particular metric is a racial proxy.) That does not necessarily mean it's being unfair to that race though. It could, hypothetically, mean that it's actually being fair to that race, advantaging them in a system/society that would otherwise disadvantage them.
"then that would detect it and take it into account."

You have a method for automatically deriving causal relationships from correlational data?

I think the two behaviors should be understood as arbitrary for illustrative purposes. The point is, as I understand it, that you can decide that one causal relationship exists and another does not, and derive a model consistent with that and with the observed statistics.

Because, as people give lip service to constantly, but never seem to really adhere to, correlation is different from causation.

Facebook developed it's "Look Alike" platform, to advertise things to people who "looked like" their current followers. Then they deployed this to hiring, home loans, and housing. The "algorithm" here just amplified whatever biases the company had to begin with. It's pretty unbelievable that Facebook did not recognize this was a problem until they were sued over it.

Making a system fair at the very least requires people designing the system to be fair. It's pretty clear that still does not happen, so I'm pretty skeptical of those that claim it's just around the corner.

Algorithms are based on statistics, or essentially stereotypes. The concept of fairness is something that can’t be adequately injected into an algorithm because it completely depends on what “fair” means and how that changes over time. What is “fair” now won’t be fair in 10 years.

It used to be considered fair to let people smoke when they wanted. Then it was considered fair to have smoking sections and non—smoking sections in restaurants. Now it’s considered fair to ban smoking entirely in restaurants and most public places.

If the algorithm charges higher car insurances premium to men, does it mean it is fair?
I think the whole point is that what causal relationships you assume matter, and they do not have to be derived from correlations. And they should not, in order to be "fair".

You have a choice of whether or not you believe being male causes car insurance claims. That is independent of the statistical correlations. Ten times a day people say correlation is not causation, but a hundred times a day, I see people implicitly insisting that it necessarily is.

It's not that people think correlation implies causation, as much as in many practical models, it's correlations you care about, not causation.

If I'm running an insurance agency and not a public policy advocacy, and my data keeps showing that men have higher accident rate than women, I can just ignore causation and build my actuarial tables based on that. I don't need a casual model here, at least not until the point I'd want to optimize my models further still, but there are diminishing returns on that.

This makes no sense to me. Everything depends on your causal model. You can't just not have one; if you don't have one, you are treating correlations as causative indiscriminately.

Suppose (just as a toy example) that being young causes accidents, and the population of men is younger, but being male does not cause accidents. You are going to charge mature men too much and lose that business to a competitor with a correct causal model.

This is quite separate from the correlational data.

The insight I get from the article is that the "correctness" of your causal model can incorporate social justice or political correctness, without being objectively mistaken, because causation is not defined by measured correlations.

In a world where most engineers just click through EULAs; don't bother to read the source code of the library they just imported; don't measure the performance of their application before it is deployed; don't run tests after installation; don't author tests; don't test their assumptions, etc. etc., it stands to reason that if an algorithm charges higher car insurance premiums, it may be for totally bullshit reasons totally obscured by some jagoff's "ML" code.

The reason fairness has so much headway among engineers isn't just an aversion towards discrimination among educated people. It's that we all know this stuff is way jankier than we care to ever admit, and that we'd never want to be the data sausage going through the algorithm grinder.

Are there no fair algorithms? I urge the authors to at least give Bogosort another chance!