Hacker News new | ask | show | jobs
by ralusek 2379 days ago
I think beyond not feeding race in as a feature to any model, this stuff is mostly nonsense. If you include race as a feature, then I think it's likely that the model will become racist, because race is so highly correlated to behaviors and patterns which are in large part the consequence of all sorts of things, including historical racism, that a model could easily mistake race as a causal factor. If you don't feed race in as a feature, however, the outputs are hardly racist. My impression has been that by and large the argument actually being made is that "we have been trying to correct for historical injustices by actively using race and gender as mechanisms for advantaging minorities and women, and an unbiased model is not properly accounting for these particular objectives."

Take something like a bank loan. If you had a model at a bank which took credit score, income, wealth, and collateral into account, black Americans would have loans rejected at a higher rate than white Americans. Is this model racist? No, this model doesn't even know what race is, all it knows is credit scores, income, wealth, and collateral. Does the fact that black Americans used to be slaves in the US, or were kept out of certain housing markets, contribute towards the fact that black Americans, on average, have lower credit scores, income, wealth, and collateral? Of course. But is this model racist? Literally not at all. It is completely unbiased, and exactly what the model should be. If the case you're making is that you think that there should be a national effort to correct for historical injustices that were done by the state by actively discriminating by race, that is a completely different discussion.

Having all of our decision-making apparatuses factor in the infinite pile of historical injustices that may have contributed to an individual's particular circumstances is not the way to go. Keep models simple and limited to what is relevant for that particular criteria. Fix injustices further upstream, or you make the whole system a convoluted nightmare.

11 comments

If the case you're making is that you think that there should be a national effort to correct for historical injustices that were done by the state by actively discriminating by race, that is a completely different discussion.

That is what proponents of the structural racism model are doing. Here's an example I took from the book Weapons of Math Destruction:

When people are convicted of a crime, they undergo a number of personality tests, including the LSI-R (Level of Service Inventory - Revised). This is a highly detailed questionnaire that asks about prior convictions, whether the prisoner had accomplices in their crimes, whether drugs or alcohol were involved, etc.

It does not ask about race.

What it does ask about are things which highly correlate with race, such as the number of police encounters (no criminal suspicion necessary), the number of friends/family/neighbours who have committed crimes, etc. If two first-time offenders have committed identical crimes but one of them grew up in wealthy suburbs and the other grew up in the rough inner city, they will receive very different scores on the LSI-R.

So what do they use the LSI-R for? They feed it into a model which assigns the offender a recidivism risk score. Then they use that risk factor directly when determining the person's sentence, restrictions, parole eligibility, etc.

So now we're not even talking about historical injustices, we're talking about ongoing injustice based on historical injustice. It's a vicious cycle, or a negative feedback loop, if you will. This is a serious problem!

Edit: Just to add another piece of the puzzle, the reason wealthy suburbs vs rough inner cities correlate so highly with race is a direct result of the historical racist practices of redlining [1] and white flight [2]. Now combine that with grinding poverty (also a result of redlining and segregation) and the war on drugs, and the result is high-crime neighbourhoods in the inner city. Those high crime neighbourhoods attract highly increased police presence, which leads to more convictions, which leads to more patrols, etc. This is another vicious cycle which feeds into the above statistical model.

[1] https://en.wikipedia.org/wiki/Redlining

[2] https://en.wikipedia.org/wiki/White_flight

I assume that the LSI-R is something that is actually trained based off of how much those factors actually predict the rate of recidivism, though, no? If friends/family/neighbors who have committed crimes is an accurate predictor of recidivism, the fact that black Americans in the inner city have more friends/family/neighbors who have committed crimes does not make the model racist. They're either good predictors or they're not. A black kid in the inner city with friends/family/neighbors who have committed crimes very likely does have a higher rate of recidivism than a white kid in the suburbs, and if this weren't true, but was being predicted by the model, then this would simply be a bad model. If it turns out that there are many black kids who happen to live near neighbors who've committed crimes, but actually do not have a higher rate of recidivism, then the model is as racist as it is using a poorly correlated indicators of recidivism.

Your indicator for whether or not a model is racist cannot simply be that the model produces outputs that are delineated by race in such a manner that is unpalatable. So long as the model is actually not using race as a means of predicting outcomes, though, any behavior that is racist would simply be due to including poor features.

I think you’re still missing the point. Whether the model is accurate or not is beside the point. A completely accurate model may indeed show a higher recidivism risk for an inner city kid compared to one from the suburbs. If it’s used in sentencing or other life-affecting decisions then it’s going to amplify historical injustices.

People commit more crimes when they have less opportunity. People have less opportunity when they grow up in high crime neighbourhoods. This is a negative feedback loop which was started by slavery and accelerated by segregation and redlining.

It’s not enough to use a hands-off approach. To correct the problem requires an active push in the opposite direction, to restore opportunity and break the cycles.

Edit: Think of it this way. You and some friends are playing Monopoly, drinking a few beers and having a great time. An hour and a half into the game (we all know games of Monopoly can last 4 hours or more), you discover one of your friends has been cheating. Now what?

He says "Sorry everyone! I'll stop cheating now and everything will be fine."

Is that true? Of course not. The proceeds from cheating may have been used to acquire the orange properties and maybe even put houses up on them. Every time you and the other friends land on those properties you end up paying rent to the previously cheating friend. Rent that he should not be collecting because those assets were acquired by cheating.

This is what it's like to have historical injustices continue to perpetuate into the future.

Yes, but you can't build it into your model. In this example, you would end up with a result of putting people back into these communities who have a high level of recidivism. You are actively not avoiding an actual issue because of perceived racial injustice when the issue is not racial.

This is the problem with processing our world down racial lines. You're trying to correct for a historical injustice. The fact that race factors into the circumstance of why people are where they are right now doesn't change the fact that those variables lead to recidivism. It's not racist. It's accurate.

If you want to fix the problem, then you need to fix the underlying issues, which tend to be economic. Those economic issues stem from an issue that affects all races, and therefore splitting it across racial lines only serves to reduce the possibility of actual change.

All you're doing when you try to account for historical injustice is slapping a band-aid on a deeper issue.

(Edit: Grammar)

I agree with you when it comes to the model: the model should be as accurate as possible. The big question is what to do with the model. The way it's being used now, the model is kind of a self-fulfilling prophecy. A prediction of high recidivism risk leads to a longer sentence which increases the likelihood of recidivism. This creates a feedback loop which increases real recidivism risk and the model changes to reflect that. If your goal is to reduce crime in society, then this may be a flawed approach.
Yes but that's not about race, that's about how we deal with crime as a society. These things aren't being unfairly applied to minority communities and that's the point. The system would be working the same way for a non-minority community, and it does, where the economic situation is similar.

That's why the racial angle is a waste of everyone's time and energy. It's not the relevant issue. The more relevant issue is how we deal with crime prevention. Currently, we go with a punishment approach rather than a truly rehabilitative one. This also has a lot to do with economics, and lobbying and private prisons and so on. It's much more complicated than 'everybody's racist'.

I don't think this is a good analogy.

In your example, it could be argued that a person who isn't cheating can keep collecting rent on their properties (however "unfair" that might seem) - i.e. the (un)fairness of the current situation (and the degree to which we try to "fix" it) depends on the path used to get there.

In the "inner city kid" example, it doesn't matter how people got there - either due to historical injustices / racism (i.e. "cheating" by the rest of society) or simply because their parents were drunks or criminals or poor or whatever - so, again, race doesn't and shouldn't matter, and helping poor inner city black kids in preference to poor inner city white kids is racism, no other way of putting it.

You can have an accurate prediction which also reflects systemic bias.

There was a story recently about NYC cops being given race based targets for arrests. If that data was fed into a system and predictions generated they could be both correct and racist.

That's maybe an extreme example, I think the person you're replying to was trying to illustrate the same thing but with greater indirection between the racism and the arrest.

To give a non-race example, I've heard that ugly people get convicted at a higher rate than good looking people. So a 'hot or not' rating could help predict reoffending conviction rates. I'd assume we would want to adjust our models to avoid that, even though it's not an incorrect prediction.

But that's not an issue with the model, that's an issue with our response to the model.

Fundamentally there's nothing wrong with this. The somewhat harsh truth is that 'systemic bias' is actually just... statistics. There are a lot of minority criminals. It's not that there's something special about these people that makes them criminals - it's the same thing that makes everybody turn to crime: lack of opportunity, low capacity for upward mobility, limited access to education, and so on. We act as if the information is not accurate, and that this is a result of racism, but the cold truth is that if you were to take a white person and a black person and only look at the likelihood of criminal behavior, the black person is going to come out on top of that. It's just the math.

Where the human element comes in is where we decide what to do about that math. Do we blame the race? Do we utilize these models to preemptively police people on the basis of race? The answer obviously, should be no. That said, our model can give us insights into this. We can take this data and go, 'well we know that it's unlikely that race is the major causal factor here, so what else can we look at?'

This is a much deeper issue, and 'structural racism' is a really bad way to look at it, because it forces you to focus on the racial elements, even if they're not relevant. It's asking for a model that is not representative of reality, because it looks ugly, rather than looking at it for what it is - just data - and figuring out what to do with that data.

The article mentions why simply excluding race isn't good enough:

"Crucially, incorporating more proximal and predictive variables into models, rather than relying on race variables to act as proxies, will improve transportability of algorithms across contexts."

If we want better models then they need to also model structural racism.

> If you don't feed race in as a feature, however, the outputs are hardly racist.

The article addresses this

> When “race-neutral” approaches are employed in model development, prediction will tend to be poorer for racial minority populations.... Two explanations for differentially poorer model performance can be addressed by collecting more data: too few observations of members of racial minority groups and unrepresentative sampling that can differentially limit generalizability. However, an additional cause of algorithmic bias is not well appreciated and cannot be overcome simply by adding more of the same kind of data to a learner....

That is a completely different argument, and has nothing to do with structural racism. That is literally just saying that minorities are less likely to have made up a sizable portion of the data sets trained on, because they're minorities, and the model is potentially less well suited to deal with issues specifically related to that minority. If the primary point of the article was that we should overcorrect for this by making including disproportionately high representation of minority data, then that's a potentially reasonable case, so long as it doesn't break the model. In the case of facial recognition not working as well on non-whites, for example, I think it's an entirely reasonable case to make to include a disproportionately higher amount of training data on those areas where the model fails to perform its function.

But you also have to realize that this is always going to be somewhat arbitrary.

that a model could easily mistake race as a causal factor.

Statistical models as used in real world systems don’t have a concept of a “casual factor”. It literally doesn’t matter for the model why in certain zip codes, there is more property crime. It doesn’t care if it’s caused by poverty of residents, by pigmentation of their skin, by lead in the paint, or by the cultural traits of residents. All it cares about is the correlation: if the risk is higher, the insurance premiums go up too. For some it might seem unfair, and for some groups such statistical discrimination might be illegal (though not for all, eg. it’s perfectly legal to charge men higher insurance rates, which suggests that the moral principle here is not equality, but rather compensation for historical mistreatment), but without a doubt, from the model’s and business perspective, such reasoning is undoubtedly correct.

Re: the bank loan scenario. The model was implemented by people. The assumption is everyone considered for a loan has had a fair opportunity to reach the threshold to be approved. That is not the case.

"A national effort to correct historical injustices" is one way but not the only. The people who create these models can refine the model or create others that determine acceptable business risks to provide loans to an under-served market.

> The assumption is everyone considered for a loan has had a fair opportunity to reach the threshold to be approved

What? In what world is this assumption being made? Do we assume that every person was born into a stable household? That every person has the same IQ, the same height, looks, had the same lucky encounters with the right people whose needs intersected with their capabilities? There are countless dimensions along which people are not the same, why would you ever assume that whether or not someone gets a bank loan has taken into account every advantage or disadvantage they have been given?

A bank loan is a business transaction where the likelihood of you being able to pay back the loan at the prescribed interest rate is being determined based off of highly predictive features, that's it.

If you want to do corrective social justice, do it in a handful of places, and let the rest of the system operate off of sensible rules. Social justice cannot permeate every single decision made in our society, it is irreducibly complex even on a single decision.

The supreme court had a different take on it.

You should check out the disparate impact rule. If your involved in housing.

https://www.hud.gov/press/press_releases_media_advisories/HU...

Other commenters have imo adequately addressed the major flaws in this comment/argument so I won't be redundant here.

That being said what is unfortunately NOT shocking, is that anyone upvoted this comment at all and it that doesn't have a negative score.

Pretending something doesn't exist (or ignoring the fact that it does exist), and modeling systems under that pretense, doesn't make the thing not exist - it only reinforces the existence of the thing.

Downvoted by racists again.

I can understand the parent commenter downvoting, esp if not convinced by the position of the article or the subsequent comments.

I knew better than to wade into this topic on this site given the audience demo of this site, I am hardly surprised at the reaction by lurkers.

I am encouraged and heartened by the commenters that actually read the article and have provided excellent reasons why you would want to build systems and models that account for systemic racism and bias.

Very illuminating all around.

> If you don't feed race in as a feature, however, the outputs are hardly racist.

Isn't the argument being made in various different place that race is there in the data regardless of whether or not you encode it as a feature because the humans that create the data already used race as part of creating the data for the model. And by creating the data, I mean the interactions in real life that create the inputs.

You can't escape it because it's already in the inputs to the model because it's a rather insidious part of our society.

People seem to be trying to answer the practical question "How do we build a non-racist model?" the answer to whi h depends entirely on the philosophical question "what does it mean for a model to be racist?" the answer to which no one can seem to agree on.
I think you're confusing race with racism.

Generally speaking..

Ignoring or dismissing relevance of race is privilege for those in the majority who can trivialize something that doesn't apply to them. Apply that to models too and see how much they miss. Value can be where others might not.

It's kind of like making a decision that if one group hasn't had an experience of race, it doesn't mean anyone else could have either. It also signals that if they don't see value or understanding in it, therefore there can't be any in it.

It takes a truly open mind to entertain any viewpoint that isn't immediately their own.

The reality is, many people live with a reality that might seem unimaginable to viewpoints like the above.

I practice each day as if humanity is one family. I go out of my way to talk to every kind of person that doesn't look like me. It doesn't mean strangers talk to me, especially from the majority..

When programmers all think and look the same and grew up in the same way and places, software tends catch fewer edge cases of everyone who doesn't look like them, in areas of computer vision.

If we step beyond CV, and look into hilarious things like automatic motion sensor sinks only detecting certain shades (or lack thereof) of skin.

Just some food for thought, happy to chat offline too :)

The argument is that upstream often doesn't have (or is willfully blind to having) a clear understanding of the features impacting the results of the model. The author is making the case that model builders need to be aware that their models may further structural racism, and need to do the work to push this awareness upstream.

Upstream may think the model--which to them looks like a black box--is perfectly rational, optimally profitable and socially beneficial in a way that is it is not. We have numerous examples where a computer has driven a decision and humans carried out its orders in ways that harmed people. Remember the man violently dragged off the United Airlines flight in 2017[1]? ICE justifies detention by tweaking their risk management software to always recommend detention[2].

This is why we need to care as people building the systems that make these decisions.

1. https://www.nbcnews.com/storyline/airplane-mode/united-fiasc...

2. https://www.vice.com/en_us/article/evk3kw/ice-modified-its-r...

> Does the fact that black Americans used to be slaves in the US, or were kept out of certain housing markets, contribute towards the fact that black Americans, on average, have lower credit scores, income, wealth, and collateral? Of course. But is this model racist? Literally not at all.

Saying that because history was racist means I'm absolved of responsibility going forward is not a strong argument. Redlining was literally a racist behavior. The point of the article is to be aware of it so you can try do better than in the past.