Hacker News new | ask | show | jobs
by tejohnso 1493 days ago
"This issue creates an enormous risk for all model deployments in medical imaging: if an AI model relies on its ability to detect racial identity to make medical decisions, but in doing so produced race-specific errors, clinical radiologists would not be able to tell, thereby possibly leading to errors in health-care decision processes."

Why would a model rely on its ability to detect racial identity to make decisions?

What kind of errors are race-specific?

4 comments

Let's say you're trying to train an model to predict if a patient has a cancerous tumor based on some imaging data. You have a data set for this that includes images from people with tumors and people without, from all races. However, unbeknownst to you, most of the images from people of race X had tumors and most of the images from people of race Y did not have tumors.

If the AI is also implicitly learning to detect race from the images, it's going to learn an association that people of race X usually have tumors and people of race Y usually do not.

The problem here is that the people training the model and the clinical radiologists interpreting data from the model may not realize that race was a confounding factor in training, so they'll be unaware that the model may make racial inferences in the real world data.

If people of race X really do have a higher incidence rate for a specific type of cancer than race Y, maybe this is OK. But if the issue is that there was bias in the training/validation data set that was unknown to the people building the model, and in the real world people of race X and race Y have exactly the same incidence rate for this type of cancer, then this is going to be a problem because it's likely to introduce race-specific errors.

Just because the model relies on race in some way doesn’t mean that we know it relies on it. I.e., the model is, unbeknownst to us, biased on race in inaccurate ways.
Presumably the model would actually be biased on race in accurate ways, if it found the correlation itself
I could be entirely wrong here, so if you've got more context in this area by all means correct me.

Consider an "AI" that rates the probability of recidivism for prisoners nearing their parole date. That score would then be presented to the parole board, and taken into consideration in determining whether or not to grant parole. If this AI were accidentally/incidentally accurately determining the race of the prisoner, then the output score would take that into account as well. Black men have a recidivism rate significantly higher than other groups[1]. The reasons for the above aside - it's a complex topic, and outside the scope of this analogy - this is extremely undesirable behavior for a process that is intended to remove human biases.

You might then ask, how does this relate to medical imaging? Medical decisions are regularly made based on the expected lifespan of the individual. It makes little sense to aggressively treat leukemia in a patient who is currently undergoing unrelated failure of multiple organs. Similarly it would likely make sense for a healthy 30-year-old to undergo a joint replacement and associated physical therapy, because that person can reasonably be expected to live for an additional 40 years while the same treatment wouldn't make sense for a 70-year-old with long-term chronic issues. This concept is commonly represented as "QALY" - "quality-adjusted life years".

Life expectancy can vary significantly based on race[2].

An AI that evaluates medical imagery that considers QALY in providing a care recommendation may result in a positive indicator for a white hispanic woman and a negative indicator for a black non-hispanic man, with all else being equal and with race as the only differentiator.

In short - it's not necessarily a bad thing for a model to be able to predict the race of the input imagery. The problem is that we don't know why it can do so. Unless we know that, we can't trust that the output is actually measuring what we intend it to be measuring.

1: https://prisoninsight.com/recidivism-the-ultimate-guide/ 2: https://www.cdc.gov/nchs/products/databriefs/db244.htm

At the risk of discussing sensitive topics on a platform ill-suited:

If, in your hypothetical recidivism case, an AI "accurately" determined that a pattern of higher recidivism-related features was correlated to race, and was able to determine "accurately" that the specific subset of recidivism-related features predicted race, why would it be wrong to make parole decisions using those recidivism-related features?

Because both the original conviction and any recidivism is determined through the decision-making of people who are aware of race and racial stereotypes. The AI would just be laundering the decisions you were already making, not improving them.

edit: imagine I was a teacher who systematically scored people with certain physical characteristics 10% lower than people who didn't have them. Let's say, for example, that I was a stand-up comedy teacher that wasn't amused by women.

If I used an AI trained on that data to choose future admissions (assuming plentiful applicants), I would end up with an all-male class. If this happened throughout the industry (especially noting that the all-male enrollment that I have would supply the teachers of the future), stand-up comedy would simply become a thing that women were seen as not having the aptitude to do, although nobody explicitly ever meant to sabotage women, just to direct them into something that they would have a better chance to succeed in.

If you decided on race, in this instance, you would be making people much more deterministic as a result of the power of race. Race is too broad a concept to reliably say that all white people are at X chance of recidivism. Instead we want to know if Marlowe is at risk of high recidivism based on her character.
Both responses address the problem with a human making a biased decision based on race, which I think mostly we all agree would be bad.

The question I was posing is different, though, because this was discussing an AI system that looked at the underlying [in this case, recidivism] data which had race and race-adjacent information removed, and the AI has effectively rediscovered the concept of "race" by connecting it to some set of attributes of the actual [in this case, recidivism-predicting] features. If the AI were to determine such a link, that doesn't make its results biased, it just makes them uncomfortable. It's not clear to me that in such a case that would mean that we should remove those [recidivism-predicting] features from the dataset just because they ended up being correlated to race.

Maybe, maybe not. Hard to say—which is the problem they call out in the paper

> efforts to control [model race-prediction] when it is undesirable will be challenging and demand further study

The correlation being "undesirable" to the individuals doing the research does not mean that the correlation is inaccurate.

I mean, sure, there are tons of ways for garbage data to sneak into ML models -- though these guys tried pretty hard to control for that -- but if the model actually determined that "race" is a meaningful feature, then that might be because it is, and science should be concerned with what is, not with what we wish were.

If one believes and proclaims that they have controlled for variable X, but they haven’t actually done so, then their results and analysis may well be invalid or misleading because of that. Whether they actually should have controlled for X or not is orthogonal.
Oh, yes, sorry. If by the correlation being possibly-undesirable you meant that it was possibly-spurious due to incompletely controlling for some bias in the source data, then yes, conclusions based on a model which found such a spurious correlation caused by incomplete input control might be undesirably biased in a not-accurate fashion.

This study appears to have done a good job controlling for known biases that could have been proxies for race, but it is presumably possible that they missed something and tainted the data

Using race as an independent factor to make medical decisions isn’t unheard of today. The medical community is largely trying to stop doing that as a matter of social policy, so it’s a problem for that goal if an AI model might be doing it under the hood.

See e.g. https://www.ucsf.edu/news/2021/09/421466/new-kidney-function...