Hacker News new | ask | show | jobs
by Barrin92 2242 days ago
> This is why, after decades of controversy, Google Translates still renders the gender-neutral “they are doctors” in German as “sie sind Ärtze” (masculine) and “they are nurses” as “sie sind Krankenschwestern” (feminine). Google Translate was not programmed to be sexist. The corpus of texts it received happened to contain more instances of male doctors and female nurses. [...] BriefCam’s spokesperson added that they used “training datasets consisting of multi-gender, multi-age and multi-race samples without minority bias,” but declined to provide any evidence or details.

Okay, will we at some point admit that we want machine learning algorithms to be able to interface with symbolic rules rather than pretending that everything is a data issue as if we're living in the era of 1930s inductionism? It's clear that the misgendering issue here is not a stochastic one that ought to be solved by 'balancing out' data, it's that we want to impose strict linguistic rules and constraints on a system in a clear manner.

There really needs to be more work done in AI that makes it possible to interface with the models we built rather than trying to reframe everything as a data problem and then shove it in some end-to-end black box and then hope that whatever comes out at the other end is correct.

The automated systems used in the article are supposed to make judgements about "Detection of body movements that constitute assault." This requires genuine understanding and high-level capacity to reason rather than just pixel-based inference from some camera.

3 comments

Lithuanian language has it in a very similar way: Daktaras(male doctor) Daktarė (female doctor).

A sentence like 'doctor said no' gets translated into 'daktaras pasakė ne'. Even a human could not translate any better unless wider context is known,which could only be derived from other sentences in the paragraph.

> at some point admit that we want machine learning algorithms to be able to interface with symbolic rules rather than pretending that everything is a data issue

That is a long way off. We find ourselves in a situation where we have models that do useful things, but are also horrendously biased.

If you want improvement now, the quickest way to get it is recasting benchmarks. Most NLP research groups just hill climb on GLUE or whatever. What gets measured gets managed, and bias is not being systematically measured.

Right now language models can't even compare ages that fall outside the 20-100 range, and it is really unlikely that we are going to quickly get to architectures with symbolic reasoning sophisticated enough to squelch bias.

To Google's credit, they are working to measure (and therefore manage) bias [2].

[1] https://arxiv.org/pdf/1912.13283.pdf

[2] https://www.youtube.com/watch?v=XR8YSRcuVLE

Google translates it that way because German nouns have grammatical gender, and doctors are grammatically male, while nurses are grammatically female. It has nothing to do with training ML models (except insofar as most training input is grammatically correct).

Of course, some people are unhappy about it and experimenting with gender stars and gender gaps and other creative orthography, but this is the status quo.

It's quite strange to demand that some ML model is more "progressive" than society and speakers themselves.

The issue here is that grammatical gender is not useful information when reasoning about the semantics of some sentence. Toothpaste in German is grammatically female, that's not a reason for an ML system to make feminine assumptions about toothpaste after it combs through data, it has nothing to do with progressive values, it's that the ML system cannot distinguish between a spurious correlation and actual meaning. Today many more medical graduates are women, this will change the ratio in the future and the inference from grammatical gender will be wrong. We should be able to tell an intelligent system from the get-go to ignore something we know to be spurious rather than fiddling around with the data.

And it's not strange at all to demand of an automated system that it behaves exactly the way we want it to behave. It is not human, it has to be more precise because it is rolled out at scale and it needs to do what we tell it to do. When we use industrial machinery in manufacturing we don't go "ah well humans are only precise down to a centimetre, guess we'll let it slide". An automated car trained on speeding drivers must not learn to speed. Automated systems are faster than humans, so errors compound, which requires more precision on part of a machine. If a ML system accidentally learns that ignoring someone wearing a green shirt is okay, and that is rolled out to a million cars, you don't have an accident but a big disaster.

We need a way to interface with ML systems in ways that let us put precise limits on when it makes inferences from data, why it made those inferences and when to follow logical rules, and when to dispose of certain data.

Why does grammatical gender exist for toothpaste? I honestly don't understand this (English speaker).
Every noun has a grammatical gender in German. It just does. There does not need to be a (contemporary) reason.

Why does declination of nouns exist in German, is almost non-existant in English, more pronounced in Latin, and even more of it in Ancient Greek?

Is one of these languages defective? Should they shed their cases?

Maybe. I just can't wrap my head around what useful function they provide for communication. I'd love to be enlightened.
Der Arzt/die Ärztin, der Pfleger/die Pflegerin (I think "Krankenschwester" isn't being used formally anymore)

Occupations generally have a male and a female variant and as far as I know you are required to address both in formal speech (so I wouldn't call this an "experiment" anymore).

The only area where I'd say this isn't implemented yet is informal speech. Few people watch their day-to-day language for gender neutrality.

> Der Arzt/die Ärztin, der Pfleger/die Pflegerin (I think "Krankenschwester" isn't being used formally anymore)

Now you are being sexist. Of course Krankenschwester is used but it has a different meaning than Krankenbruder

> Occupations generally have a male and a female variant and as far as I know you are required to address both in formal speech (so I wouldn't call this an "experiment" anymore).

In many languages there are occupations which are specifficaly related to specific genres.

> The only area where I'd say this isn't implemented yet is informal speech. Few people watch their day-to-day language for gender neutrality.

This has become something disgusting. A mother is a mother. A father is a father. A mother can give birth, a father not. Some things will always be gender specific. Or shall we start calling people gender neutral: Mr./Mrs. Donald Trump

I'm not sure I totally understand what you're trying to communicate but I didn't say we need to invent gender neutral terms for every German word. I said formal German is moving away from describing people using words that only exist as either masculine or feminine. So depending on where you sample your data it's not unusual to encounter these speech patterns. (Of course, if an ML algorithm can pick that up is a different question.)

And (this is an opinion now) I don't think that's "disgusting". Agreed, it's kinda clunky to always mention both variants but stereotypes are subtle and this is one thing we can do that's not too invasive. Of course this isn't an ideal solution either because gender non-conformity exists and the German language has no good way of addressing a lot of that. (But there are deeper cultural problems there.)

Also, I'm not sure what you mean with "Krankenbruder" but it sounds like something from /r/ich_iel.

I think you're right, but you're being downvoted by people who don't speak German and don't understand how "sexist" the language really is, even if Germans themselves are as enlightened on the subject as one could be.

People forget that English has reinvented itself for the last hundred years and it's not showing any signs of slowing down, whereas German-speaking culture is a little more resistant to the mores of newspeak. Until you lived and loved in both languages it's not really clear how equitable German culture is, in comparison to English. A German might say there is a level of respect in calling something feminine, whereas an English-only speaker might react with abhorrence. Such misunderstandings are to be expected across the language barrier.