| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aulin 1493 days ago
	what's this enormous risk they're talking about? racial bias in x-ray reading? race can be a risk factor in plenty of diseases, why should we actively try to remove this information from medical images?

9 comments

matthewdgreen 1493 days ago

"This issue creates an enormous risk for all model deployments in medical imaging: if an AI model relies on its ability to detect racial identity to make medical decisions, but in doing so produced race-specific errors, clinical radiologists (who do not typically have access to racial demographic information) would not be able to tell, thereby possibly leading to errors in health-care decision processes."

towaway15463 1493 days ago

Without knowing the actual outcome, isn’t there also a possibility of error due to not knowing the race of the individual? They used mammogram images in the study and it is well known that incidence of breast cancer varies by race. Removing that information from the model could result in worse performance.

cameldrv 1493 days ago

Well one thing you wouldn’t want to do is take the output of this model and then apply a correction factor for race on top of it, because the model is already taking that into account.

towaway15463 1493 days ago

Is that true or would it help as a tie breaker in cases where the confidence was just at or below the threshold?

cameldrv 1492 days ago

Well I suppose you only care about a correction factor to a binary model when it breaks a tie. You wouldn't want to apply a tiebreaker correction twice though.

ibejoeb 1493 days ago

Typically? It's coded in the standard. There's a DICOM tag for it.

https://dicom.innolitics.com/ciods/procedure-log/patient/001...

matthewdgreen 1493 days ago

Unlike the authors of this research paper I am not a trained clinician, so I can't tell you. However I would note that the first exemplary value in the link you gave me is "REMOVED".

ibejoeb 1493 days ago

It doesn't provide example data, but there's still a spot in the standard for it. The values can differ by modality or manufacturer. Sure, it's not required, but certainly it's very important in some situations. Consider dermoscopy.

If interested, searching for "dicom conformance" should yield lots of docs that probably contain specific values for those things.

ska 1493 days ago

FWIW, the standard printed out is multiple linear feet of shelf space. There is a spot for a lot of things.

One common issue is a lot of these kinds of tags rely on optional human input and are inconsistently applied. As opposed to say, modality specific parameters produced by a machine, which are consistent.

DICOM is a great example of design by committee, with the +'ve and -'ves that implies.

nradov 1493 days ago

I don't understand that part. All modern EHRs have a field for self-reported race, and clinical radiologists do typically have access to that information. (Whether they actually look at it, or whether it's useful when reading images, are separate issues.)

aulin 1493 days ago

ok, maybe it's an US specific thing, why wouldn't a clinical radiologist have all the information he can gather about his patient including race to help the diagnosis?

codefreeordie 1493 days ago

Because in the US we are required to pretend that there is no such thing as race and no such thing as gender, and all people are exactly and precisely the same and there can be no differences.

Loughla 1493 days ago

Not to get into a flame war, but I want to present an alternate option to yours.

Because in the US some people have a hard time understanding that all races and genders deserve to be treated equally as humans with the same access to goods and services. Further, that there are disparities in care based on race/ethnicity[1][2] and gender[3][4] because of that racism/sexism present in the systems. This then leads to requiring that race/ethnicity and gender data be scrubbed sometimes to keep people from impacting outcomes based on their own biases.

[1] https://www.americanbar.org/groups/crsj/publications/human_r...

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1924616/

[3] https://www.americashealthrankings.org/learn/reports/2019-se...

[4] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965695/

codefreeordie 1493 days ago

It sometimes makes sense to scrub race/ethnicity/gender information from certain types of data, typically when a human is going to be making individual decisions.

For example, not having race data on resumes is generally productive, because that categorization can't provide a meaningful input to the decision associated with an individual person. Even if it were to be the case that there was some correlation between race and skill at whatever job you're interviewing for[1], the size of the effect is almost certainly small, and in the meanwhile you've also controlled for any bias in the person doing the reviewing.

If you're having a machine look at a dataset, and the machine determines that race or ethnicity is a material factor in determining some attribute in that dataset, you're not doing anybody any good by denying that fact and destroying the result.

[1]Let's ignore for the purposes of this discussion, fields (like certain sports) where extreme competition combines with a position heavily dependent upon racially-linked physical characteristics. Though even in this case, there is still a (different, weaker) argument for suppressing race data in "resumes" (yes, I know, ballplayers don't submit resumes to their local NBA franchise)

pessimizer 1493 days ago

Race is a rough, subjective, culturally-bound summary of characteristics. If you're already evaluating characteristics, adding either your guess of race or a self-reported race is like injecting gossip into good data.

If the outcome that you're trying to predict is also affected by perceptions of race, you've built a gossip feedback loop.

nerdponx 1493 days ago

Then you should be looking at ethnicity and not "race" as such. For example, Ashkenazi Jews as an ethnic group are genetically very distinct from other Europeans, but are generally considered "white" on self-reported race surveys.

bumby 1493 days ago

>If you're having a machine look at a dataset, and the machine determines that race or ethnicity is a material factor in determining some attribute in that dataset...

I think the trickiness is in providing the machine unbiased data to begin with so that it doesn't incorrect associations between features like race. The most egregious examples I'm aware of are the machine learning systems used to suggest criminal sentencing, but, apropos to this topic I believe there are cases where it may produce erroneous associations in something like skin cancer risk.

sandworm101 1493 days ago

>> Because in the US we are required to pretend that there is no such thing as race

Then you are not pretending very well. When I lived in the US I was shocked at how often it was an issue. It permeates nearly every aspect of US culture.

The icing on that cake: A government-run interactive map so you can lookup which races live in which neighborhoods. Some versions allow you to zoom in to see little dots representing clusters of black or white residents. https://www.census.gov/library/visualizations/2021/geo/demog...

nradov 1493 days ago

Actually, the US federal government specifically recommends that healthcare providers record patients' race, ethnicity, assigned sex, and gender identity. Most of those elements are self identified.

https://www.healthit.gov/isa/uscdi-data-class/patient-demogr...

iancmceachern 1493 days ago

Interesting, this is like the dog learning calculus thing. We may create an AI that could perceive things that we aren't able to, or perceive things differently, because we're "limited" in a way that the AI isn't. We wouldn't be able to even tell this is going on, because we don't have the mental model in place to account for it to understand it. We'd be the dog.

KaiserPro 1493 days ago

> racial bias in x-ray reading?

no, it implies there is a signal in the dataset that could be something other than clinical. This means that until they can pinpoint the cause, or the thing the AI is detecting, all the other things it predicts are suspect.

ie if the AI thinks the subject is west african, then it might be more inclined to diagnose something related to sickle cell.

Or north western european woman in her mid 60s vs a japanese woman might get widly different bone density readings for the same level of "blob" (most medical imaging is divining the meaning of blobs and smears )

fumblebee 1493 days ago

My first thought here is to relate this to the problem of early colour film, which was largely tested and validated with only light skin tones in mind. Once it was put out into the wild, folks with darker skin tones found the product to be total crap. Why? Because there was a glaring OOD (Out of Distribution) problem during testing.

Similarly, if the train/test sets used here - for X-ray based diagnostics - using Machine Learning relies only on specific races, then the performance might be worse for other races, given that there's a new discriminatory variable in play.

The obvious solution here is to reduce bias by ensuring race is part of the dataset used for training and testing. Which, due to PII laws in play, may actually be quite challenging! Fascinating tradeoff imo.

ibejoeb 1493 days ago

I don't get it either. It's accurate. It would be a problem if it got it wrong, which could, for example, underweight quantitative genetic data and adversely influence differential diagnosis.

Retric 1493 days ago

AI is driven by the training sets, but the goal is to find the underling issues.

Suppose AI #1 got a higher score on the training data and AI #2 had a more accurate diagnosis. Obviously you want #2 but if there is bias in the training data based on race and the AI has access to race then eventually you overfit into #1.

pdpi 1493 days ago

ML models are great tools, but they're way too much of a black box. What you have here is a model that's predicting something you think it shouldn't have been possible to predict, and you can't simply ask it where that prediction comes from. Absent an explanation for how the model is doing this, you have to consider the possibility that whatever is poisoning that prediction will also poison others.

axg11 1493 days ago

> ML models are great tools, but they're way too much of a black box.

A human doctor is also a black box, in meat form.

dekhn 1493 days ago

yep, the case for "enormous risk" hasn't been well articulated. It's been repeated a lot, but of all the problems in medical care, this isn't one of the larger ones.

unsupp0rted 1493 days ago

What if it turns out that humans have identifiable biological differences among genetic sub-groups, ethnicities, etc? It would be anarchy in the social sciences.

sim7c00 1493 days ago

soon they will want to remove race indicators for photographs and tik tok videos. who knows, maybe its racist to be of a race >.>