Hacker News new | ask | show | jobs
by DoreenMichele 2449 days ago
There's a long, long history of leaving women, people of color, poor people and other groups out of data sets. For example, I've read articles that indicate we can't create good photos of people of color because film standards were normalized to white skin.

So, try to fix that and... there's hell to pay?

File under: "No good deed goes unpunished."

3 comments

I don't think there is anything wrong with the memo saying "we need more people of color in our data set." I hope everyone agrees with that.

What seems to have been bad is the contractor misinforming people about what data would be collected (and for what use), and it's not clear what Google had in their contract to prevent that kind of unethical behavior. It is also very questionable IMO to target the homeless "because they won't talk to the media" which was allegedly in the instructions the contracting firm Randstad gave to it's workers.

Disclaimer: While I work as a low level employee at an unrelated team in Google, my opinions are my own and do not represent those of my employer, and this is the first I am hearing of this.

It also seems bad to Target the homeless "because they won't talk to the media" or whatever the quote was.

To me, this just parses as "We have some new Politically Correct excuse to exclude poor people from our dataset."

Being so unimportant that the world wants you to remain invisible isn't generally a good thing.

There is always some excuse. There is no condition under which it is sufficiently respectful, politely handled, blah blah blah to be A Good Idea.

It's not an easy problem as you imply. It's not that you are "leaving out" groups so much that you have to go out of your way to include minorities in your data set by the definition of the word minority. This can make your project magnitudes more complex.

No matter what you make, some minority corner case will break your tech and generate outrage. ("How DARE your speech recognition not work on AAVE!", "How DARE your facial recognition not work on burn center victims!" etc.)

But now they have a dataset where a disproportionate proportion (possibly the last majority) of people of color represented were homeless.

That's bound to introduce other kinds of bias into the data.

Such as what? Some bizarre and unfounded idea that poverty and skin color have some correlation?
Such a correlation may or may not exist, but surely it's evident that no conclusion either way could be drawn from a dataset constructed using Google's method.

And yet imbalanced datasets are used all over the place, e.g. to identify "criminals" in China (https://www.newscientist.com/article/2114900-concerns-as-fac...) and the US (https://www.engadget.com/2019/08/14/aclu-facial-recognition-...)

I spent time homeless. I was frequently mistaken for a tourist based on how I looked, because of my casual clothing (usually a t-shirt and sweatpants). People typically figured out I was homeless based on my habits, not my appearance.

I'm off the street. I still look and dress the same, in part because I currently do freelance work from home. I don't have to meet a dress code.

While homeless and in downtown San Diego, I fairly often gave away food I had been given but couldn't eat, either because of dietary restrictions or time limits (in that a large amount of stuff that should be refrigerated would spoil before I could eat it). I tried to offer it to other homeless people mostly.

One woman who panhandled regularly was reluctant to accept too much food from me, explaining "I'm not homeless." She panhandled because she was a retiree in high-priced downtown San Diego living on a fixed income. I told her to take it home, stick it in the fridge and eat some tomorrow. I assured her it was fine, I didn't have a fridge.

Another woman got mad at me for offering and told me to feed it to my dog. She was sitting on a curb in a neighborhood near a lot of homeless services where sitting on the curb outside was often a sign of homelessness.

She was also black and I'm white. She likely lived in the apartment building she was in front of and probably thought I was being a racist bitch. She was insulted at my sincere offer of charity and attempt to give away most of the fresh fruit I had been given so it wouldn't go to waste.

There are a lot of stereotypes about what homeless people look like. The reality is that there are a lot of homeless people with jobs and/or attending college and/or living in their car who successfully manage to pass for "normal" much of the time.

I have no idea what criteria was used to target homeless people by Google, but I'm skeptical that the dataset:

A. Is representative of homeless people generally.

B. Was chosen based on people looking homeless, rather than people behaving homeless.

C. Actually is a 100% correlation that people believed to be homeless were actually homeless.

The examples you give are blatant misuses of data sets. How you source the data has little bearing on the dumb ideas people come up with for how to use it.