Hacker News new | ask | show | jobs
by danShumway 1248 days ago
Do you have an example? Most languages I've seen that do this seem to be deriving gendered word variants from the pronoun, at least as far as I can tell with my very limited experience.

Spanish has a lot of word variants based on feminine/masculine/plural subjects, but it's not really based on gender (otherwise inanimate objects wouldn't have those variants); as far as I can tell with my limited Spanish experience they're based on word agreement in the sentence with the pronouns or at most with masculine/feminine presentation.

Is there ever a scenario where, "el gata" would be correct in Spanish? "Gato" agrees with the "el" pronoun; it's not based on the gender identity of the noun independently of the pronoun, is it? Are there other languages that work differently?

This also seems like a problem that doesn't really require knowledge of gender identity as much as "do you want us to use masculine/feminine variants of words when referring to you?" -- something that seems easy enough to guess based on the pronoun or (when loading up a language translation that needs more advanced logic) to just outright ask the user.

I kind of hate the software trend towards "we need to derive everything we're doing from first principles"; I feel like a lot of these problems could be solved by saying, "when we encounter an edge case we'll ask what to do, rather than doing data collection up-front that will be irrelevant for the majority of users."

2 comments

> Do you have an example? Most languages I've seen that do this seem to be deriving gendered word variants from the pronoun, at least as far as I can tell with my very limited experience.

That's a weird way to put it. Words have grammatical gender, in some languages like Spanish there are articles like "el" or "la" that go with that, but in other languages like Russian there's no article.

For things in Spanish the grammatical gender is fixed. Eg, a window is always a feminine word. For cats it of course depends on the cat.

You're not matching the word to the article, but the other way around. "ventana" is a feminine word, so there's always a "la" before it.

> Is there ever a scenario where, "el gata" would be correct in Spanish?

Not in that specific case, but there are rare nouns where both are valid, eg "el mar" and "la mar", and sometimes with a subtly different meaning used for poetic effect.

So genuine question, because I am in the middle of building a dialog system that I'd like it to work with multiple languages -- it sounds like in the worst-case scenario we can encompass all of that behavior by just having a toggle in settings next to the pronouns for specifically those languages: "use masculine word variants / use feminine word variants".

If even that; if I know someone uses "el", then "el mar" isn't a problem, and I know to use masculine word variants in other locations. Is there any scenario where knowing that a user/player uses "el" to refer to themselves wouldn't allow you to derive what gender-variant of another word to use when referring to them?

I guess if someone is using completely agender pronouns (I don't know what that would be in Spanish) I'd need to ask about feminine/masculine word variants, but I'm still struggling to see why I need to know their actual gender.

Translation is actually a very tough problem, especially in games.

Take a sentence like "$PERSON picked up $ITEM".

Russian requires knowing the gender of $PERSON because the verb "to pick" is modified depending on the gender of the person. It also needs the accusative declension of the $ITEM. In Russian you don't just say "book" in every context possible like in English, the word "book" gets different endings depending on the context it's used. A bit like verbs vary in English: become, became.

You also need to be very careful with things like word play -- it just doesn't translate right. Eg, there's a point in Monkey Island where an actual monkey is used as a wrench, because "monkey wrench". That just doesn't translate, at all.

And culture. Eg, things like honorifics and the general way people talk may not necessarily translate. For instance apparently the famous Star Wars "Do not want" happened because in Mandarin shouting just "NO!" isn't a thing.

Point being, no, you can't translate simply and naively. Translating something like a game is a very serious job where you should actually talk to translators in advance if possible to figure out whether your wanted design is going to be a huge pain or not, and if something might not translate at all.

> Russian requires knowing the gender of $PERSON

But this is exactly what I'm asking -- does it actually require knowing the gender of the person, or does it just need the pronoun to line up with other word variants? Because those are two different things. I keep asking this, and people keep on replying with language examples where knowing the pronoun would be completely sufficient to translate the sentence.

Does the Russian language allow you to mismatch pronouns and gendered variants of words with each other when referring to the same subject? Because if it doesn't, you don't need to know the gender, just the pronoun, and then you need to match the gendered variants of words to the pronoun.

What is an example of a sentence where if I knew someone's pronouns in a given language but not their explicit gender identity, I would not have enough information about them to be able to translate that sentence?

----

> Point being, no you can't translate simply and naively.

This is also not what I'm asking. I'm not auto-translating games, I'm attempting to build systems where the options and information I'm collecting from players would allow a professional translator to translate that game.

I'm told up above that this requires not just knowing someone's pronouns but also their explicit gender identity. I can't find a language example where that's true.

> But this is what I'm asking -- does it actually require knowing the gender, or does it just need the pronoun to line up with the variant of the word?

That's one and the same the way I understand your question. Gender implies a specific pronoun. Though there can be exceptions, like where the "sea" in Spanish can be both a "he" or a "she", and which a famous poem uses to a hard to translate effect by alternating between both.

But, and I say this very seriously, translation is very lacking in things being "just" something. Eg, see this for a discussion of more issues:

https://manpages.ubuntu.com/manpages/bionic/man3/Locale::Mak...

> Does the Russian language allow you to mismatch pronouns and gendered variants of words with each other when referring to the same subject? Because if it doesn't, you don't need to know the gender, just the pronoun, and then you need to match the gendered variants of words to the pronoun.

Russian allows for sentences without any pronouns, or only neutral ones. Eg, the sentence "I forgot", in Russian has "I" as not indicating any gender, and it only being present in the ending of "forgot". You have to understand Russian grammar and cases to extract it from there. And yet "I know" is gender neutral. It's funny like that.

> This is also not what I'm asking. I'm not auto-translating games, I'm attempting to build systems where the options and information I'm collecting would allow a translator to translate that game.

I'm not talking about auto anything. I'm talking about that support for translation in a game is complex and requires serious planning. Any time you're composing a sentence from parts is likely a place where complexity will explode exponentially as you add support for more languages.

There's also all sorts of weird quirks to consider. Eg, if you have some sort of mystery, in Russian gender appears pretty much everywhere, so if your mystery murderer is one of the few women in the setting, then Russian makes it nigh impossible for anybody to refer to her, or for her to talk about herself, and not drastically reduce the list of suspects by instantly revealing it has to be a woman.

If you truly like to suffer, have a place where you form a string of the form of "$PERSON picked up $COUNT $ITEMS".

In some languages you need the gender of $PERSON, and it'll affect the verb. You'll need to know the right declension of each item that can be possibly picked up. Plurals of course need to be accounted for, and in Russian the forms of "file" for amounts of 1, 2 and 5 are different. So a little thing like that can balloon into pages worth of weird and complex code.

An additional fun quirk is polite language and honorifics. "Do you want some coffee?" can be said to anyone in English, has formal and informal forms in Russian and Spanish, and a whole bunch of possibilities in Japanese. If you happen to mention 5 different people in a Japanese sentence you'll likely make it clear who's your younger sister, who's a classmate, who's your superior, and who's the jerk you hate.

> Russian allows for sentences without any pronouns, or only neutral ones.

Right, but you're describing a situation where a specific singular sentence doesn't happen to have a gendered pronoun in it, you're not describing a sentence that requires knowing a person's gender. If you knew as a translator what pronouns a person commonly used, you'd still be able to translate the sentence "I forgot" in a grammatically correct way, right?

Unless Russia supports someone simultaneously using a masculine pronoun and a feminine variant of "forgot" when talking about them? But my understanding is that it doesn't.

We're not talking about a situation where we have no idea who the subject of that sentence is -- if you know what pronouns a person usually uses in Russian, it still seems like you could pretty easily translate that person saying "I forgot" -- because the important grammatical part of that is the consistency between the gendered variant of "forgot" and the gender variant of the pronouns that person usually uses.

----

> If you truly like to suffer, have a place where you form a string of the form of "$PERSON picked up $COUNT $ITEMS".

So I have looked into this a bit as part of building dialog systems for my games, and yeah, it is super complicated. But while yes, it absolutely requires writing a ton of code and supporting a ton of variants and possibly even writing specific language-dependent code for certain translations, and while yes, it does require tracking object state to a much greater degree than you typically would for a purely English game, it still doesn't seem to change anything about what information I need to ask the player during their profile setup.

I'm still having trouble finding an example of where asking a player what their pronouns are isn't sufficient information from that player to do a translation.

Understand, I am not saying that translations would be easy, I am not saying that dynamically constructed sentences would be simple (they would not be simple). I am not saying that cultural translation and differing norms and references wouldn't be intensely difficult to deal with. But I can't find an example where I need to know the player's gender. I don't know of a language that gramatically distinguishes between pronouns and gender to a degree where knowing someone's pronouns wouldn't be sufficient to determine what gender-variant words to use to when referring to them.

I'm trying to imagine a scenario where someone says, "I use primarily he/him, but technically I'm actually agender", and I reply, "oh, good to know, we couldn't have done a translation with your character without also knowing about the agender part."

I think you're confused. In Spanish and other languages, both pronouns and other words depend on grammatical gender. Saying that you can “guess based on the pronoun” doesn't make sense, because the pronoun depends on gender too.

It's true that inanimate objects have kind of arbitrary genders, but for specific people, they're based on their actual gender.

> because the pronoun depends on gender too.

But you can ask the pronoun. If you know the pronoun, you know what variants of words to use, don't you? You don't need to know if the person is transgender or what their gender identity is, if they use `el`, you use masculine word variants to refer to them.

I'm not sure what I'm missing here; the only reason why knowing the gender identity would matter is if gendered word variants are allowed to mismatch pronouns in the language.

And even in that case, does the specific gender identity matter, or do we really only need to know whether someone wants to use masculine/feminine word variants when referring to them?

Well, you could ask about the pronoun and use it to determine gender, but how is that different from asking for gender directly?
My understanding is that it's reasonably common for a chunk of agender/nonbinary people to use traditional feminine/masculine pronouns.

It just seems a bit more direct to ask how people want to be referred to; I would compare it to how typically in web forms we ask people if they go by Mrs/Ms/Mr/Dr/etc directly, rather than asking them if they have a PHD and are married.

Especially if we're talking about gendered languages where there isn't strong support for nonbinary pronouns/variants. Someone telling us that they're agender/nonbinary doesn't actually help us much in that situation; we won't know from that information alone whether to use feminine or masculine variants.