Hacker News new | ask | show | jobs
by simonw 1746 days ago
This happened to both Google Photos and Flickr too. Which makes it an inexcusable mistake to make in 2021 - how are you not testing for this?

Google Photos in 2015: https://www.wired.com/story/when-it-comes-to-gorillas-google...

Flickr in 2015: https://www.independent.co.uk/life-style/gadgets-and-tech/ne...

8 comments

The reason these companies don't fix these systems is because they don't know how. It is easier to remove certain outputs or retire the whole system. There is no line of code they can tweak.
Richen the dataset it’s trained on enough so that the model is correct before you release it to prod.
That's sort of obvious. How do you know that wasn't attempted?
Even if it was fixed, in a probabilistic system like this, isn't it basically guaranteed to happen with some inputs?
Is that a real question? Of course it will happen. In this particular case there was a single misclassified video reported in the article.
Yes it's a real question, since there's nothing that says that a particular misclassification must happen. Watching cars go by on the road, one might suspect that at least one is driven by an alligator, but nothing says that it must be, per se, even the law of large numbers.
If it was we wouldn’t expect this problem to occur, correct?
We don't have enough information to root cause the problem
No, that's not correct.
That makes it sound even worse that they knowingly released it without fixing it.
Are you saying an ML system should never be released if it doesn’t have perfect accuracy?
Tagging black people as monkeys is not a showstopper bug? If so it makes them look even worse than if it was an overlooked bug.
I agree.

It does seem worse that way.

We don't actually know how to do that, or how rich is "rich enough." It's an open avenue of research to be able to extrapolate how well-tuned a neutral net is on data not in its training set.

Not to imply the problem is unsolvable, just that if an institution has zero tolerance for this mistake, the fix your describing is no guarantee it won't occur.

That’s not quite complete, right? It’s that we don’t know how to do that without sacrificing other things.
This reminds me of a favorite tweet from 2013: "Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there" -- https://twitter.com/alliebland/status/402990270402543616

Facebook, like a lot of tech companies, has long had problems with diversity in engineering. Here's an article from April that discusses specific incidents and the broader background: https://www.washingtonpost.com/technology/2021/04/06/faceboo...

This isn't a problem with diversity. Everybody knows how to pronounce Malcom X. And it's not like just because a google engineer was black that he was like "oh, let's try and see if Malcom X is pronounced correctly because he's black and I'm black too". This only happens in white people's brain.
I don't know if I 100% align up with how you stated it, but yea, its a matter of training data set. I don't think these companies have published their training data set. But thinking back on the issue with asians and facial recognition on Apple's face ID. If they just choose 100 people at random, based off US statistics, 5-6 of those 100 people would have been Asian. And that reflects the 5.7 percent of the population is Asian. And we probably all agree 5-6 people is not a sufficient data set, but picking 100 people at random would be a pretty easy assumption to make for making a data set.

So yea, I think it is an issue with generating a data set and not hitting a sufficient amount of test cases. Because in this instance, asians would be an edge case where creating a small data set to train an algorithm on with a group with a lower representation in the population.

I wonder what the datasets of companies like Xiaomi look like. FaceID always worked for me, so it seems like it works for non-asian faces.
Maybe they took more caution to their data set. I think the only way we would know is if they publish their sets or how they built them. But I was just highlighting maybe one possible case that Apple could have generated their training set, just grab 100 people in America at random.
You realize that the person I'm quoting isn't white, right?

Let's separate the general case from the specific. Generally, we know that representation in the people who make things changes what they make. This is obvious and undeniable. For example, look at ASCII vs Unicode. The Chinese invented movable type 500 years before Gutenberg, so it's not like the idea of printing non-roman characters was novel. In the age of telegraphy, Europeans developed encodings that included umlauts and accents; by 1851 they were merged into International Morse Code.

So why in 1963 was ASCII codified without any of that? And why did that become the dominant standard for an extended period? Because it was mainly Americans in the rooms where the technology was being created.

Similarly, we know that standard color films were developed by white people to represent white people well: https://www.vox.com/2015/9/18/9348821/photography-race-bias

And we all know how this happens. It's the same reason a lot of open-source software is good for a developer audience, not an end-user one: making things means iterating on them until they're good enough for the people involved.

That's the general case, so let's return to the specific case. If you want to prove that ML systems doing racist stuff has nothing to do with who made it, then you can't just handwave it away. You have to show why that specific project was set up so carefully and so well that it would avoid the natural pitfalls of any technology project. And then despite that it went on to do racist stuff. For reasons that you'd then have to explain.

Considering the adversarial attacks that image recognition systems are vulnerable to, perhaps even a well trained system could be induced to produce inappropriate results of one sort or another. Perhaps the training set and algorithm for the model should just be publicly available so that people can scrutinize the data and figure out incrementally how to avoid most biases or guffaws to a generally accepted level.
> This only happens in white people's brain.

'Eleven Jinping': Indian TV fires anchor over blooper.[1]

[1] https://www.bbc.com/news/world-asia-india-29274792

To play devil advocate, maybe the station fired her for ignorance of current events.
> maybe the station fired her for ignorance of current events.

That would be a valid reason, but I suspect a more culturally appropriate one: loss of reputation. We are sensitive to that.

My point was this isn't something that only goes on in 'white' brains but more of a cultural issue. Most people in the West are incapable of pronouncing Asian names. I don't see people making a big issue out of it.

In what universe is "Ten" a more common pronunciation of "X" than "X"? You might have an argument for "II" or "III", but I'll be shocked if any street in USA is named after the tenth generation of really unimaginative namers.
Do you think Google is having someone go through the tens of thousands of street names?

Or do you think they had a team (on a completely different project or perhaps company) write a text to speech function that wasn't well suited for directions.

Streets have lots of numbers after all. People frequently have numbers in their name.

I could see that for Google Maps v1.0. I think we're past that point now. There's no reason they should still be using libraries suited to parsing the names of forgotten European monarchs.
They’re neither forgotten, unused, nor is it a nomenclature used exclusively by Royals; nor are all the Royals that use this fashion dead or out of power.
Oh for Pete's sake, absolute bloody conspiracy level nonsense, NOBODY sat there twirling their villainous mustache and programmed an exception to hardcode pronouncing X as 10, it's simply a matter of the training and sample data having access to some type of corpus that contained a great deal of Roman numerals.

(Leave the software engineering to the software engineers)

>> to some type of corpus that contained a great deal of Roman numerals.

I wager that there is more text online about Louis XIV than of Malcom X. Certainly there are many more books on that epic corner of French history than one modern US leader. Then there are all the British kings. Point an AI at the internet and it likely would decide that roman numerals are most often pronounced as number than letters. Malcom X would be rare an exception that might need to be hard coded.

For sure. If we're going with the common pronunciation of Roman numerals in English names, it's "Tenth". E.g., We don't say "Henry Ford III" as "Henry Ford Three" but "Henry Ford the Third".
There’s a Louis XIV Street in New Orleans (and I imagine elsewhere).
You mean Louis 'Ziv', according to Google
Putting Louis XIV in Google translate, I get the correct "Louis the Fourteenth" and "Louis Quatorze" pronounciations in English and French, respectively. However, it has to be uppercased, otherwise it spells the letters.
The implication is that a black person would be more likely to recognize the inherent flaw in automatically interpreting "X" as "10", and in all honestly that's probably true. It isn't a matter of testing, it's a matter of having people with a diverse set of cultural perspectives in the room when decisions like that are made to begin with.
Diversity doesnt guarantee you automatically catch or account for edge cases. As a minority I am disturbed by some of the odd takes people have about diversity. Theres thousands upon thousands of roads. Unless you have a QA team test directions to every road in the country you wont ever catch the issue with a road named Malcom X. You don’t even have to be ‘diverse’ to know who that is.
It doesn't guarantee it, but it helps.

I personally have gotten bugs fixed at Google. How? Because I, a white man, spotted a bug, cared about it, and talked to white men of my acquaintance at Google who had enough power to get things done. How did I know them? From other tech companies created, run, and majority staffed by other white men.

Why am I in these networks at all? Well, my dad was a software developer and he introduced me early on. How did he get his start? His dad, an insurance company exec, brought him in to deal with this newfangled computer thing they had just gotten. That was in Milwaukee in the mid-1960s. I promise you that although Milwaukee had a significant black population, exactly zero of them were insurance company executives in the mid-1960s.

So what Allie Bland knew when she wrote her tweet was that she did not have any connection to Google where she might be able to get a to-her glaringly obvious pronunciation issue fixed. That in her estimation no black person did. And I see no reason to think she was wrong.

This is a contrarian take that may get me downvoted and unfairly labeled, but I encourage critical thinking instead:

I've struggled with people telling me that these FAANG companies have "diversity problems," as a person of color myself. A majority of software engineers are female and male immigrants from East Asia and South Asia. These population centers are some of the most diverse regions of the world. The engineers who have been hired by preparing for and passing these companies' selective merit based coding tests had to overcome adverse conditions in their home countries as well, including extreme poverty, starvation, and totalitarian regimes.

Why do they not count toward diversity, to some white and white-adjacent critics? What message are we sending to people who are ethnic minorities from certain groups who earned their spots through merit and have also been targeted in recent newsworthy attacks, just as others have, when we make these kinds of accusations? What does a non problematic ethnic composition look like? What are these companies doing right toward some minority groups and wrong towards others?

There is literally no right answer, the very nature of modern diversity is that it will always be a moving target. That is until we get over the entire concept of diversity which is racist / discriminatory at it's core.
That's incorrect. The main use of diversity is in an antiracist fashion. I'd suggest you read one of Kendi's books. Stamped from the Beginning has clear and readable descriptions of the difference, but it's a relatively long work, so you might start with one of his shorter books.
Instead of dismissing the argument with a tawdry negated statement and a book suggestion, do you have some thoughts of your own with this matter, or at least some kind of summary?
No.

Long ago I learned that it was rarely worth my time to try to argue online people out of their ignorance. A rando with a throwaway account, a strident tone, and a fair bit of ignorance on the topic is almost a guarantee that that's no point.

If you're interested in knowing something about the topic, you'll do some work. If you aren't, no amount of me spoon-feeding you summaries of serious scholarly works will change that.

If you do end up learning something and have questions, feel free to email me. I'm glad to discuss the topics with people who are serious about it.

South / East Asia has more than half the world’s population yet doesn’t count towards diversity.
Why not? My point is that it should! What percentage of the US population is from South / East Asia? How does it compare to the representation of others? If it's similar or less, and it still somehow doesn't "count," then we have a diversity problem.
Nobody is saying they don't count toward diversity. What people are saying is that the conspicuous exclusion of less favored racial groups does not get erased because they have some people from other groups.

Put more frankly, the success of recent immigrants does not erase America's long history of brutality and exploitation toward blacks and Latin Americans. The latter is a problem that we have to solve regardless.

And I think it's worth noting that some of the immigrants have brought their own biases with them, such that caste discrimination is now also a problem in Silicon Valley: https://www.washingtonpost.com/technology/2020/10/27/indian-...

> Put more frankly, the success of recent immigrants does not erase America's long history of brutality and exploitation toward blacks and Latin Americans.

But given that America was far more brutal and exploitative towards Chinese immigrants than towards Latin Americans, why are Latinos so prioritized by these initiatives to favor certain racial groups?

> And I think it's worth noting that some of the immigrants have brought their own biases with them, such that caste discrimination is now also a problem in Silicon Valley: https://www.washingtonpost.com/technology/2020/10/27/indian-...

Ironic that in a discussion about diversity, you believe in a prejudiced stereotype about a major ethnic group in Silicon Valley. Casteism is pretty much a nonissue in Silicon Valley, if only for the simple reason that most Indian-Americans tend to be ignorant about the castes of most other Indian-Americans.

Sure, and we are discussing the existence of racial discrimination in engineering hiring at top tech companies, not American history or South Asian culture. Asian immigrants on H1-B conducting coding tests as interviewers at FAANG did not involve themselves in the American Jim Crowe south, for example. It's saddening to see America's own past being used to justify discrimination in the present, even to people who aren't originally from the US.

You might not share the beliefs of others that are gainfully rallying behind diversity as a cause to justify penalizing some minority groups for "doing too well" and bolstering others (the literal definition of discrimination), but it IS happening -- and certainly more people than "nobody" are backing it, provoking my original statements. Someone had to put Prop 16 on the ballot, for example (which was thankfully voted against by a large margin of fellow CA Democrats).

The notion that American tech companies are somehow entirely separate from and unrelated to American history is quite a belief to hold. It's not one that stands up to any understanding of the topic, alas. But since that's a hill you've chosen to die on, I'll leave you to it.
The short answer is that tech companies run diversity programs for three reasons: they believe in righting wrongs, they don't want to be sued for biased hiring practices, and they don't want bad PR. All three require under-represented minorities.

Turning it back on you, what should the point of a diversity program be? What's meant to be achieved outside those three goals?

While I certainly understand bad PR (a surprising number of people lack critical thinking skills), what is wrong or biased about hiring for coding positions based on merit-based performance on an objective coding test? Anybody regardless of background or group membership that passes will be hired, meaning it is fair and unbiased, by definition -- that is the diversity program, and if there is some lack of objectivity, that is what needs to be addressed. If that is not the case, then yes, I agree with you, the hiring process would be biased.
You really need to interrogate "merit" and "objective". Nominally objective standards have long been used to advance racial discrimination in the US. For example: https://en.wikipedia.org/wiki/Literacy_test

You should also look up the extensive critiques of meritocracy as a concept. There's a lot of literature there.

Further, I know of no major tech company who uses a nominally "objective coding test" as the only criterion for hiring. And they shouldn't, because being good at taking coding tests is not the job and not what we should be hiring for.

No, coding tests are not the "literacy tests" you have described, and if they were, why would some minorities be performing even better than Caucasians on them?

Coding tests examine the type of work actually required to be done on the job (as coders), and they have been correlated with post-hire performance successfully. Someone who is not familiar with efficient data structures will not write scalable code and will end up creating a burden on their teammates during on-call, for example. Asking someone to solve an engineering problem with a provably correct answer is an objective test for hiring engineers, and I will have a difficult time continuing to engage with anyone who counteracts this basis of reality and truth.

When I was hired there were three coding test rounds and one interpersonal round. You might argue that the latter is where racial discrimination seeps in, as well as the recruiter outreach step itself, but somehow I am optimistic that a bunch of tolerant Californians have moved past applying a Literacy Test here already by hiring a majority immigrant / minority workforce. In my situation, my recruiter was also an Asian-American minority.

Actually listening to minorities instead of summoning some kind of sick quota for different ethnicities. Racists are in stark decline and it didn't even take a diversity program or a change in language rules.

The companies are then righting the wrongs on the shoulders of innocents, that most likely never were racists to begin with. In short, just committing to another mistake.

> These population centers are some of the most diverse regions of the world.

South Asia and SE Asia, maybe. But East Asia (NE China, Korea, Japan) has actually one of the most ethnically "pure" populations in the world.

> South Asia and SE Asia, maybe. But East Asia (NE China, Korea, Japan) has actually one of the most ethnically "pure" populations in the world.

Northeast China–usually defined as the provinces of Liaoning, Jilin, and Heilongjiang–does not belong on your list. According to the 2000 Chinese census, about 10% of the population of Northeast China comes from ethnic minorities – the majority of whom are Manchus, but also including significant numbers of Mongols and Koreans. That is far from being 'one of the most ethnically "pure" populations in the world'-especially when compared to Japan or Korea.

Indeed, even though Northeast China was (in 2000) approximately 90% Han, prior to the 19th century Han were a minority in the region, and Manchus were the numerically (and politically) dominant ethnic group.

According to the 2000 Census, the most ethnically homogenous part of China is not the North or Northeast, but rather Eastern China, which is over 99% Han (and, as well as being over 99% Han overall, 4 of its 7 provinces are over 99% Han too.) By contrast, North China is about 94% Han and Northeast China is only around 90% Han.

(There have been two Chinese censuses since, in 2010 and 2020, but I can't find ethnicity figures for them.)

This is so ridiculously ignorant.
Your comment implies black engineers will check that Malcom X Boulevard is pronounced correctly. That's awfully specious.
Alternatively it just implies white engineers never have their GPS taking them through Harlem.
Yes, all engineers are white </sarcasm>.

Or how about this one: Yes, all black engineers on the maps team live in New York.

Truth is that it is just an example one of the thousands of edge-cases that exist in these types of complex products, and some of them will look like they have some sinister basis.

Or that Google Maps is primarily developed in Australia for a worldwide market.
Did the geo team get moved from Seattle?
It really doesn't. There are more things, Horatio.
As others noted, just because someone is black doesn't mean that they would have caught this. The whole point of ML is to adapt to what is effectively an unbounded set of inputs, pretty much by definition there will be cases where even a team of 100% black people will train a model that, given the correct input, will fail in ways that particularly affect black people.
> Facebook, like a lot of tech companies, has long had problems with diversity in engineering.

If that is the case, why is it that Google voice nav routinely butchers the names of places and roads in India in spite of having thousands of Indian engineers on staff?

Could we blame the intractability of the problem, or just plain old incompetence, before we blame every single problem in the world on racism and lack of 'diversity'?

Strong agreement here. The impulse to attribute any mishap on anything race-adjacent to racism is one of the most destructive memes at the moment.

It forces a worldview where malice is the default assumption and encourages the "enemies all around us" mindset.

Another example: Apple Maps pronounces “Jai Ho” as “high hoe”. Apparently Apple has too many Latino engineers and not enough Indian engineers?
Maybe, but in the particular case you mentioned there is a specific word, "jai", that is pronounced as "high". See Jai Alai, which has been absorbed into English.
Given that the goal of racism is to structure society, and given how well that succeeded in America, I don't think it's unreasonable to ask whether it's at play in pretty much any situation where we see racially biased outcomes.

But it is an excellent question why Google Maps is still terrible at Indian place names even though they have plenty of people internally who not only could help, but would be delighted to. The answer to that will be essentially sociological. If you think that answer in no way includes structural inequity despite it being pervasive in America since its founding, you will have to explain how you think Google managed to eliminate that in the Maps division and then managed to re-introduce some sort of structure that leaves a wealth of internal knowledge untapped.

> structural inequity despite it being pervasive in America since its founding

America is not unique in this. And African-Americans are not the only people in the world who were enslaved. What is unique is that America and Americans are so good at controlling narratives and sucking oxygen out of rooms that other stories and catastrophes are forced into irrelevance.

America is not unique in this. But America's history is uniquely relevant to the problems in America. Where Facebook is based and the tech industry is centered.
> But America's history is uniquely relevant to the problems in America. Where Facebook is based and the tech industry is centered.

Sure it is. But if Facebook, Google and other American companies want to indulge their Americentric proclivities to the detriment of everyone else, they should voluntarily withdraw from the rest of the world.

> Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there

Silly Google TTS, the proper pronunciation is obviously "Malcolm the Tenth" there.

google maps is made in Australia, and the diversity there is different
Google Photos solved the problem by simply returning no results for words like gorilla, monkey, primate, etc.
I was just thinking about that. Unfortunately it just makes the bias harder to detect.

Once you search for these:

https://www.google.com/search?q=human+female+face&tbm=isch

https://www.google.com/search?q=human+male+face&tbm=isch

You can see that 'human face' has a bit of post-hoc tuning.

https://www.google.com/search?q=human+face&tbm=isch

So disappointing. I was legitimately looking for a monkey pic I took years ago to no avail because of no searchability. One of the richest companies in the world prefers to just remove ability than to solve hard problems. But hey, at least we all get ads.
It’s an inevitable result of angry mobs (like this article and entire HN thread) and risk-intolerant corporations.

It’s impossible to test every image for accuracy and to guarantee it won’t happen again, so they just sidestep it entirely.

But what would you do if you were them? You solved 95% of the problem, you are left with 5% that are extremely hard. Would you throw a large amount of resources to solve that? And, given that you basically deal with probabilities and that the system will never work 100% anyway, and that even one mistake of that kind will cause uproar - is there any other feasible solution?
You can't just "test" a neural network like that. For all you know they tested a thousand pictures of Chimpanzees and Gorillas against the network, but for some reason the NN decided to classify the photo differently because the subject was standing in front of the wrong kind of tree or wearing a funny-colored hat.

There's no super reliable way to prevent this (with current tech) other than forbidding that output entirely.

Is it inexcusable that if I search 'Japan' to look for pics from my trip to Japan, it shows me pictures containing any Asian person at all? If I search Japan today, I get mostly pics of my not Japanese wife. But I guess we don't complain enough for anyone to care.

https://i.ibb.co/Mf6rVdf/Screenshot-20210907-002516-Photos.j...

Nobody who has traveled at all would mistake my wife and child as Japanese. And doing so is especially insidious considering the Bataan death march.

> Which makes it an inexcusable mistake to make in 2021 - how are you not testing for this?

They probably are, but not good enough. These things can be surprisingly hard to detect. Post hoc it is easy to see the bias, but it isn't so easy before you deploy the models.

If we take racial connotations out of it then we could say that the algorithm is doing quite well because it got the larger hierarchical class correct, primate. The algorithm doesn't know the racial connotations, it just knows the data and what metric you were seeking. BUT considering the racial and historical context this is NOT an acceptable answer (not even close).

I've made a few comments in the past about bias and how many machine learning people are deploying models without understanding them. This is what happens when you don't try to understand statistics and particularly long tail distributions. gumboshoes mentioned that Google just removed the primate type labels. That's a solution, but honestly not a great one (technically speaking). But this solution is far easier than technically fixing the problem (I'd wager that putting a strong loss penalty for misclassifiying a black person as an ape is not enough). If you follow the links from jcims then you might notice that a lot of those faces are white. Would it be all that surprising if Google trained from the FFHQ (Flickr) Dataset?[0] A dataset known to have a strong bias towards white faces. We actually saw that when Pulse[1] turned Obama white (do note that if you didn't know the left picture was a black person and who they were that this is a decent (key word) representation). So it is pretty likely that _some_ problems could simply be fixed by better datasets (This part of the LeCunn controversy last year).

Though datasets aren't the only problems here. ML can algorithmically highlight bias in datasets. Often research papers are metric hacking, or going for the highest accuracy that they can get[2]. This leaderboardism undermines some of the usage and often there's a disconnect between researchers and those in production. With large and complex datasets we might be targeting leaderboard scores until we have a sufficient accuracy on that dataset before we start focusing on bias on that dataset (or more often we, sadly, just move to a more complex dataset and start the whole process over again). There's not many people working on the biased aspects of ML systems (both in data bias and algorithmic bias), but as more people are putting these tools into production we're running into walls. Many of these people are not thinking about how these models are trained or the bias that they contain. They go to the leaderboard and pick the best pre-trained model and hit go, maybe tuning on their dataset. Tuning doesn't eliminate the bias in the pre-training (it can actually amplify it!). ~~Money~~Scale is NOT all you need, as GAMF often tries to sell. (or some try to sell augmentation as all you need)

These problems won't be solved without significant research into both data and algorithmic bias. They won't be solved until those in production also understand these principles and robust testing methods are created to find these biases. Until people understand that a good ImageNet (or even JFT-300M) score doesn't mean your model will generalize well to real world data (though there is a correlation).

So with that in mind, I'll make a prediction that rather than seeing fewer cases of these mistakes rather we're going to see more (I'd actually argue that there's a lot of this currently happening that you just don't see). The AI hype isn't dying down and more people are entering that don't want to learn the math. "Throw a neural net at it" is not and never will be the answer. Anyone saying that is selling snake oil.

I don't want people to think I'm anti-ML. In fact I'm a ML researcher. But there's a hard reality we need to face in our field. We've made a lot of progress in the last decade that is very exciting, but we've got a long way to go as well. We can't just have everyone focusing on leaderboard scores and expect to solve our problems.

[0] https://github.com/NVlabs/ffhq-dataset

[1] https://twitter.com/Chicken3gg/status/1274314622447820801

[2] https://twitter.com/emilymbender/status/1434874728682901507

>how are you not testing for this?

i wonder how testing for that looks and sounds in corporate environment. It may as well be an area similar to patents - you pretend that you never heard, never discussed, God forbid any mentioning in corporate email/chat/etc. or clicking on a link from inside a corporate network,...

Why are you so sure they aren't testing for it? Bias finds a way.
Curious if anyone on HN has built a testing framework to catch this kind of issue.