Hacker News new | ask | show | jobs
by wpietri 1746 days ago
This reminds me of a favorite tweet from 2013: "Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there" -- https://twitter.com/alliebland/status/402990270402543616

Facebook, like a lot of tech companies, has long had problems with diversity in engineering. Here's an article from April that discusses specific incidents and the broader background: https://www.washingtonpost.com/technology/2021/04/06/faceboo...

7 comments

This isn't a problem with diversity. Everybody knows how to pronounce Malcom X. And it's not like just because a google engineer was black that he was like "oh, let's try and see if Malcom X is pronounced correctly because he's black and I'm black too". This only happens in white people's brain.
I don't know if I 100% align up with how you stated it, but yea, its a matter of training data set. I don't think these companies have published their training data set. But thinking back on the issue with asians and facial recognition on Apple's face ID. If they just choose 100 people at random, based off US statistics, 5-6 of those 100 people would have been Asian. And that reflects the 5.7 percent of the population is Asian. And we probably all agree 5-6 people is not a sufficient data set, but picking 100 people at random would be a pretty easy assumption to make for making a data set.

So yea, I think it is an issue with generating a data set and not hitting a sufficient amount of test cases. Because in this instance, asians would be an edge case where creating a small data set to train an algorithm on with a group with a lower representation in the population.

I wonder what the datasets of companies like Xiaomi look like. FaceID always worked for me, so it seems like it works for non-asian faces.
Maybe they took more caution to their data set. I think the only way we would know is if they publish their sets or how they built them. But I was just highlighting maybe one possible case that Apple could have generated their training set, just grab 100 people in America at random.
You realize that the person I'm quoting isn't white, right?

Let's separate the general case from the specific. Generally, we know that representation in the people who make things changes what they make. This is obvious and undeniable. For example, look at ASCII vs Unicode. The Chinese invented movable type 500 years before Gutenberg, so it's not like the idea of printing non-roman characters was novel. In the age of telegraphy, Europeans developed encodings that included umlauts and accents; by 1851 they were merged into International Morse Code.

So why in 1963 was ASCII codified without any of that? And why did that become the dominant standard for an extended period? Because it was mainly Americans in the rooms where the technology was being created.

Similarly, we know that standard color films were developed by white people to represent white people well: https://www.vox.com/2015/9/18/9348821/photography-race-bias

And we all know how this happens. It's the same reason a lot of open-source software is good for a developer audience, not an end-user one: making things means iterating on them until they're good enough for the people involved.

That's the general case, so let's return to the specific case. If you want to prove that ML systems doing racist stuff has nothing to do with who made it, then you can't just handwave it away. You have to show why that specific project was set up so carefully and so well that it would avoid the natural pitfalls of any technology project. And then despite that it went on to do racist stuff. For reasons that you'd then have to explain.

Considering the adversarial attacks that image recognition systems are vulnerable to, perhaps even a well trained system could be induced to produce inappropriate results of one sort or another. Perhaps the training set and algorithm for the model should just be publicly available so that people can scrutinize the data and figure out incrementally how to avoid most biases or guffaws to a generally accepted level.
> This only happens in white people's brain.

'Eleven Jinping': Indian TV fires anchor over blooper.[1]

[1] https://www.bbc.com/news/world-asia-india-29274792

To play devil advocate, maybe the station fired her for ignorance of current events.
> maybe the station fired her for ignorance of current events.

That would be a valid reason, but I suspect a more culturally appropriate one: loss of reputation. We are sensitive to that.

My point was this isn't something that only goes on in 'white' brains but more of a cultural issue. Most people in the West are incapable of pronouncing Asian names. I don't see people making a big issue out of it.

In what universe is "Ten" a more common pronunciation of "X" than "X"? You might have an argument for "II" or "III", but I'll be shocked if any street in USA is named after the tenth generation of really unimaginative namers.
Do you think Google is having someone go through the tens of thousands of street names?

Or do you think they had a team (on a completely different project or perhaps company) write a text to speech function that wasn't well suited for directions.

Streets have lots of numbers after all. People frequently have numbers in their name.

I could see that for Google Maps v1.0. I think we're past that point now. There's no reason they should still be using libraries suited to parsing the names of forgotten European monarchs.
They’re neither forgotten, unused, nor is it a nomenclature used exclusively by Royals; nor are all the Royals that use this fashion dead or out of power.
Oh for Pete's sake, absolute bloody conspiracy level nonsense, NOBODY sat there twirling their villainous mustache and programmed an exception to hardcode pronouncing X as 10, it's simply a matter of the training and sample data having access to some type of corpus that contained a great deal of Roman numerals.

(Leave the software engineering to the software engineers)

>> to some type of corpus that contained a great deal of Roman numerals.

I wager that there is more text online about Louis XIV than of Malcom X. Certainly there are many more books on that epic corner of French history than one modern US leader. Then there are all the British kings. Point an AI at the internet and it likely would decide that roman numerals are most often pronounced as number than letters. Malcom X would be rare an exception that might need to be hard coded.

For sure. If we're going with the common pronunciation of Roman numerals in English names, it's "Tenth". E.g., We don't say "Henry Ford III" as "Henry Ford Three" but "Henry Ford the Third".
There’s a Louis XIV Street in New Orleans (and I imagine elsewhere).
You mean Louis 'Ziv', according to Google
Putting Louis XIV in Google translate, I get the correct "Louis the Fourteenth" and "Louis Quatorze" pronounciations in English and French, respectively. However, it has to be uppercased, otherwise it spells the letters.
The implication is that a black person would be more likely to recognize the inherent flaw in automatically interpreting "X" as "10", and in all honestly that's probably true. It isn't a matter of testing, it's a matter of having people with a diverse set of cultural perspectives in the room when decisions like that are made to begin with.
Diversity doesnt guarantee you automatically catch or account for edge cases. As a minority I am disturbed by some of the odd takes people have about diversity. Theres thousands upon thousands of roads. Unless you have a QA team test directions to every road in the country you wont ever catch the issue with a road named Malcom X. You don’t even have to be ‘diverse’ to know who that is.
It doesn't guarantee it, but it helps.

I personally have gotten bugs fixed at Google. How? Because I, a white man, spotted a bug, cared about it, and talked to white men of my acquaintance at Google who had enough power to get things done. How did I know them? From other tech companies created, run, and majority staffed by other white men.

Why am I in these networks at all? Well, my dad was a software developer and he introduced me early on. How did he get his start? His dad, an insurance company exec, brought him in to deal with this newfangled computer thing they had just gotten. That was in Milwaukee in the mid-1960s. I promise you that although Milwaukee had a significant black population, exactly zero of them were insurance company executives in the mid-1960s.

So what Allie Bland knew when she wrote her tweet was that she did not have any connection to Google where she might be able to get a to-her glaringly obvious pronunciation issue fixed. That in her estimation no black person did. And I see no reason to think she was wrong.

This is a contrarian take that may get me downvoted and unfairly labeled, but I encourage critical thinking instead:

I've struggled with people telling me that these FAANG companies have "diversity problems," as a person of color myself. A majority of software engineers are female and male immigrants from East Asia and South Asia. These population centers are some of the most diverse regions of the world. The engineers who have been hired by preparing for and passing these companies' selective merit based coding tests had to overcome adverse conditions in their home countries as well, including extreme poverty, starvation, and totalitarian regimes.

Why do they not count toward diversity, to some white and white-adjacent critics? What message are we sending to people who are ethnic minorities from certain groups who earned their spots through merit and have also been targeted in recent newsworthy attacks, just as others have, when we make these kinds of accusations? What does a non problematic ethnic composition look like? What are these companies doing right toward some minority groups and wrong towards others?

There is literally no right answer, the very nature of modern diversity is that it will always be a moving target. That is until we get over the entire concept of diversity which is racist / discriminatory at it's core.
That's incorrect. The main use of diversity is in an antiracist fashion. I'd suggest you read one of Kendi's books. Stamped from the Beginning has clear and readable descriptions of the difference, but it's a relatively long work, so you might start with one of his shorter books.
Instead of dismissing the argument with a tawdry negated statement and a book suggestion, do you have some thoughts of your own with this matter, or at least some kind of summary?
No.

Long ago I learned that it was rarely worth my time to try to argue online people out of their ignorance. A rando with a throwaway account, a strident tone, and a fair bit of ignorance on the topic is almost a guarantee that that's no point.

If you're interested in knowing something about the topic, you'll do some work. If you aren't, no amount of me spoon-feeding you summaries of serious scholarly works will change that.

If you do end up learning something and have questions, feel free to email me. I'm glad to discuss the topics with people who are serious about it.

What distinction does a throwaway account make on an otherwise-anonymous online forum? No need to take discussion offline. Within the next decade, I am in confident the pendulum will swing the other way, and the people who are able to vocalize their opinions now in public will be the ones needing throwaway accounts.
South / East Asia has more than half the world’s population yet doesn’t count towards diversity.
Why not? My point is that it should! What percentage of the US population is from South / East Asia? How does it compare to the representation of others? If it's similar or less, and it still somehow doesn't "count," then we have a diversity problem.
Nobody is saying they don't count toward diversity. What people are saying is that the conspicuous exclusion of less favored racial groups does not get erased because they have some people from other groups.

Put more frankly, the success of recent immigrants does not erase America's long history of brutality and exploitation toward blacks and Latin Americans. The latter is a problem that we have to solve regardless.

And I think it's worth noting that some of the immigrants have brought their own biases with them, such that caste discrimination is now also a problem in Silicon Valley: https://www.washingtonpost.com/technology/2020/10/27/indian-...

> Put more frankly, the success of recent immigrants does not erase America's long history of brutality and exploitation toward blacks and Latin Americans.

But given that America was far more brutal and exploitative towards Chinese immigrants than towards Latin Americans, why are Latinos so prioritized by these initiatives to favor certain racial groups?

> And I think it's worth noting that some of the immigrants have brought their own biases with them, such that caste discrimination is now also a problem in Silicon Valley: https://www.washingtonpost.com/technology/2020/10/27/indian-...

Ironic that in a discussion about diversity, you believe in a prejudiced stereotype about a major ethnic group in Silicon Valley. Casteism is pretty much a nonissue in Silicon Valley, if only for the simple reason that most Indian-Americans tend to be ignorant about the castes of most other Indian-Americans.

Sure, and we are discussing the existence of racial discrimination in engineering hiring at top tech companies, not American history or South Asian culture. Asian immigrants on H1-B conducting coding tests as interviewers at FAANG did not involve themselves in the American Jim Crowe south, for example. It's saddening to see America's own past being used to justify discrimination in the present, even to people who aren't originally from the US.

You might not share the beliefs of others that are gainfully rallying behind diversity as a cause to justify penalizing some minority groups for "doing too well" and bolstering others (the literal definition of discrimination), but it IS happening -- and certainly more people than "nobody" are backing it, provoking my original statements. Someone had to put Prop 16 on the ballot, for example (which was thankfully voted against by a large margin of fellow CA Democrats).

The notion that American tech companies are somehow entirely separate from and unrelated to American history is quite a belief to hold. It's not one that stands up to any understanding of the topic, alas. But since that's a hill you've chosen to die on, I'll leave you to it.
The short answer is that tech companies run diversity programs for three reasons: they believe in righting wrongs, they don't want to be sued for biased hiring practices, and they don't want bad PR. All three require under-represented minorities.

Turning it back on you, what should the point of a diversity program be? What's meant to be achieved outside those three goals?

While I certainly understand bad PR (a surprising number of people lack critical thinking skills), what is wrong or biased about hiring for coding positions based on merit-based performance on an objective coding test? Anybody regardless of background or group membership that passes will be hired, meaning it is fair and unbiased, by definition -- that is the diversity program, and if there is some lack of objectivity, that is what needs to be addressed. If that is not the case, then yes, I agree with you, the hiring process would be biased.
You really need to interrogate "merit" and "objective". Nominally objective standards have long been used to advance racial discrimination in the US. For example: https://en.wikipedia.org/wiki/Literacy_test

You should also look up the extensive critiques of meritocracy as a concept. There's a lot of literature there.

Further, I know of no major tech company who uses a nominally "objective coding test" as the only criterion for hiring. And they shouldn't, because being good at taking coding tests is not the job and not what we should be hiring for.

No, coding tests are not the "literacy tests" you have described, and if they were, why would some minorities be performing even better than Caucasians on them?

Coding tests examine the type of work actually required to be done on the job (as coders), and they have been correlated with post-hire performance successfully. Someone who is not familiar with efficient data structures will not write scalable code and will end up creating a burden on their teammates during on-call, for example. Asking someone to solve an engineering problem with a provably correct answer is an objective test for hiring engineers, and I will have a difficult time continuing to engage with anyone who counteracts this basis of reality and truth.

When I was hired there were three coding test rounds and one interpersonal round. You might argue that the latter is where racial discrimination seeps in, as well as the recruiter outreach step itself, but somehow I am optimistic that a bunch of tolerant Californians have moved past applying a Literacy Test here already by hiring a majority immigrant / minority workforce. In my situation, my recruiter was also an Asian-American minority.

I didn't say coding tests were literacy tests. You also seem a lot like somebody who has not hired people, which would explain your poor understanding of how hiring actually works.

Since my comments here don't seem to be making any sense to you, I'm not seeing the point in trying again.

Actually listening to minorities instead of summoning some kind of sick quota for different ethnicities. Racists are in stark decline and it didn't even take a diversity program or a change in language rules.

The companies are then righting the wrongs on the shoulders of innocents, that most likely never were racists to begin with. In short, just committing to another mistake.

> These population centers are some of the most diverse regions of the world.

South Asia and SE Asia, maybe. But East Asia (NE China, Korea, Japan) has actually one of the most ethnically "pure" populations in the world.

> South Asia and SE Asia, maybe. But East Asia (NE China, Korea, Japan) has actually one of the most ethnically "pure" populations in the world.

Northeast China–usually defined as the provinces of Liaoning, Jilin, and Heilongjiang–does not belong on your list. According to the 2000 Chinese census, about 10% of the population of Northeast China comes from ethnic minorities – the majority of whom are Manchus, but also including significant numbers of Mongols and Koreans. That is far from being 'one of the most ethnically "pure" populations in the world'-especially when compared to Japan or Korea.

Indeed, even though Northeast China was (in 2000) approximately 90% Han, prior to the 19th century Han were a minority in the region, and Manchus were the numerically (and politically) dominant ethnic group.

According to the 2000 Census, the most ethnically homogenous part of China is not the North or Northeast, but rather Eastern China, which is over 99% Han (and, as well as being over 99% Han overall, 4 of its 7 provinces are over 99% Han too.) By contrast, North China is about 94% Han and Northeast China is only around 90% Han.

(There have been two Chinese censuses since, in 2010 and 2020, but I can't find ethnicity figures for them.)

This is so ridiculously ignorant.
Your comment implies black engineers will check that Malcom X Boulevard is pronounced correctly. That's awfully specious.
Alternatively it just implies white engineers never have their GPS taking them through Harlem.
Yes, all engineers are white </sarcasm>.

Or how about this one: Yes, all black engineers on the maps team live in New York.

Truth is that it is just an example one of the thousands of edge-cases that exist in these types of complex products, and some of them will look like they have some sinister basis.

Or that Google Maps is primarily developed in Australia for a worldwide market.
Did the geo team get moved from Seattle?
It really doesn't. There are more things, Horatio.
As others noted, just because someone is black doesn't mean that they would have caught this. The whole point of ML is to adapt to what is effectively an unbounded set of inputs, pretty much by definition there will be cases where even a team of 100% black people will train a model that, given the correct input, will fail in ways that particularly affect black people.
> Facebook, like a lot of tech companies, has long had problems with diversity in engineering.

If that is the case, why is it that Google voice nav routinely butchers the names of places and roads in India in spite of having thousands of Indian engineers on staff?

Could we blame the intractability of the problem, or just plain old incompetence, before we blame every single problem in the world on racism and lack of 'diversity'?

Strong agreement here. The impulse to attribute any mishap on anything race-adjacent to racism is one of the most destructive memes at the moment.

It forces a worldview where malice is the default assumption and encourages the "enemies all around us" mindset.

Another example: Apple Maps pronounces “Jai Ho” as “high hoe”. Apparently Apple has too many Latino engineers and not enough Indian engineers?
Maybe, but in the particular case you mentioned there is a specific word, "jai", that is pronounced as "high". See Jai Alai, which has been absorbed into English.
Given that the goal of racism is to structure society, and given how well that succeeded in America, I don't think it's unreasonable to ask whether it's at play in pretty much any situation where we see racially biased outcomes.

But it is an excellent question why Google Maps is still terrible at Indian place names even though they have plenty of people internally who not only could help, but would be delighted to. The answer to that will be essentially sociological. If you think that answer in no way includes structural inequity despite it being pervasive in America since its founding, you will have to explain how you think Google managed to eliminate that in the Maps division and then managed to re-introduce some sort of structure that leaves a wealth of internal knowledge untapped.

> structural inequity despite it being pervasive in America since its founding

America is not unique in this. And African-Americans are not the only people in the world who were enslaved. What is unique is that America and Americans are so good at controlling narratives and sucking oxygen out of rooms that other stories and catastrophes are forced into irrelevance.

America is not unique in this. But America's history is uniquely relevant to the problems in America. Where Facebook is based and the tech industry is centered.
> But America's history is uniquely relevant to the problems in America. Where Facebook is based and the tech industry is centered.

Sure it is. But if Facebook, Google and other American companies want to indulge their Americentric proclivities to the detriment of everyone else, they should voluntarily withdraw from the rest of the world.

> Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there

Silly Google TTS, the proper pronunciation is obviously "Malcolm the Tenth" there.

google maps is made in Australia, and the diversity there is different