Hacker News new | ask | show | jobs
by frankie_t 2141 days ago
I don't think this is inherent to English, or any other language (perhaps in more specific cases when there is no word with the similar meaning).

I think in general we pack a concept in a word and lose some information this way, so when you want to be precise with what you are saying you have to bring your definitions with you. Essentially with translation you take a concept and "pack" it in a word, then look for an equivalent packing in different language, then unpack. Naturally, this process is prone to losing information.

3 comments

I cannot agree. Not only English seems to have many homonyms (the word "spring" alone has more than 2 meanings), its grammar is also somewhat primitive. Let me bring one more example, this time another way around: Google Translate from German to Russian. The verb "tragen" ("to wear") is translated as "износ" ("a wear") which is a noun. Using English we lose important knowledge: we have no clue what part of speech the word "tragen" is.

This isn't an issue for any considerably long fragment of text, it will be properly translated due to context analysis. Still if the text would be analyzed using German in the first place, this would become less of a problem.

You're confusing two separate things: linguistic complexity and ambiguity.

Linguistic complexity is hard to measure, but it's not hard to show that at least morphologically, English is undercomplex compared to many other languages.

This doesn't necessarily mean that English is more ambiguous, though. Unlike German, English typically has very rigid word order, so in the context of a sentence, you'll know if a specific word is a noun or a verb.

The problem here is that many NLP models inadequately capture syntactic structure.

Sorry if it was confusing, I really wanted to mention both a) lexical ambiguities b) syntactic ambiguities as possible obstacles for NLP.

> Unlike German, English typically has very rigid word order, so in the context of a sentence, you'll know if a specific word is a noun or a verb.

So you say you are able to guess from the word order what part of speech a particular word is. But with German you hardly need all this guessing.

If you compare two marginal examples: - English "time flies like an arrow" - German "Wenn Fliegen fliegen hinter Fliegen..."

you'll find out the English one has way more possible interpretations.

> So you say you are able to guess from the word order what part of speech a particular word is. But with German you hardly need all this guessing.

Not really. It's not about guessing: in English, the part of speech really is mostly determined by its syntactic structure.

> If you compare two marginal examples: - English "time flies like an arrow" - German "Wenn Fliegen fliegen hinter Fliegen..."

Not sure what you're trying to say here. The English example is ambiguous, yes (and only strictly grammaticaly; semantically the meaning is clear, unless you're using it in the phrase "time flies like an arrow, fruit flies like a banana", which is meant as a linguistic joke). It's also very easy to come up with examples of phrases or sentences that are ambiguous in German, or in any language for that matter. Here are some fun examples:

"Er liest das Buch seiner Schwester vor" (could either mean "he's reading the book to his sister" or "he's reading his sister's book to someone")

"der weiße Schimmel" ("white mould", or "white horse")

"wilde Tiere jagen" ("to hunt wild animals", or "wild animals are hunting")

and don't even get me started on the ambiguity of compound words or phrases with a genitive, where there are often tons of potential interpretations depending on the intended relationship between head and dependent noun.

And also the German example you gave (fully: "Wenn Fliegen hinter Fliegen fliegen, fliegen Fliegen Fliegen nach", or "if flies fly behind flies, flies fly after flies") is a) another joke sentence nobody uses in practice, and b) is exactly a case where you can only distinguish the part of speech (and the grammatical case) of a word from the syntactic structure and not from its morphology, something you claimed doesn't happen in German, but here it clearly does.

Look, you may make a case that it's easier for English sentences to be ambiguous than for some other languages, but I would need to see some good data before I believed that claim, because it's just not something that is immediately obvious.

I still think you're missing my point, although I am impressed by your German skills ("der Schimmel" is BTW just a homonym, it's hardly related to the topic of syntactic ambiguity).

> is a) another joke sentence nobody uses in practice, and b) is exactly a case where you can only distinguish the part of speech (and the grammatical case) of a word from the syntactic structure and not from its morphology, something you claimed doesn't happen in German, but here it clearly does.

I didn't make such a strong claim. All I wanted to say in German syntactic ambiguities are much less of a problem than in English. I've brought two anecdotal evidences to let you compare possible ambiguities in both of them, these two are indeed nothing but jokes.

But let's take a closer look at them once again.

a) "Time flies like an arrow": the word "time" can be 1) a noun 2) an adjective 3) a verb in declarative form 4) a verb in imperative form. This gives us a factor of 4 on the very first word of the sentence.

b) "Wenn Fliegen hinter Fliegen fliegen" - ambiguitity exists just between "fliegen" as a verb and "Fliegen" as a plural noun, thus the "ambiguity factor" of the word "f/Fliegen" is just 2.

> but I would need to see some good data before I believed that claim, because it's just not something that is immediately obvious.

Fair enough.

>> I think in general we pack a concept in a word and lose some information this way, so when you want to be precise with what you are saying you have to bring your definitions with you. Essentially with translation you take a concept and "pack" it in a word, then look for an equivalent packing in different language, then unpack. Naturally, this process is prone to losing information.

This is a plausible description of how humans perform translation, but it does not apply to machine translation, because we have no good way to represent the meaning of a word other than with the word itself. Consequently machine translation systems can't distinguish between different meanings of the same word and instead try to produce a correct translation by relying on frequency-based heuristics: faced with two likely translations of a word, a system will try to determine the context of the word (in terms of its collocation with other words) and then assign to the word the meaning it has in the context that happens to be the most common according to its training dataset. Clearly, that is like "flying blind"; sometimes it will work, sometimes it will fail and there's no way to predict beforehand which.

The comment above gave the "spring" example, my routine example is asking Google Translate to translate Greek "χελιδόνι" (the bird, swallow) to French and getting "avaler" (the verb, to swallow) instead of the correct "hirondelle", again because translation goes from Greek to French via English, introducing ambiguity about the intended meaning of "swallow" that does not exist in either Greek or French. Note that this doesn't happen when the word "χελιδόνι" is used in a sentence (e.g. "ένα το χελιδόνι" translates to "un l'hirondelle", which is ungrammatical and nonsensical but at least gets the right noun), but it's a good test to show that Google Translate is really incapable of recognising the meaning of words and so cannot use such information to make translations. Note that the same goes for machine translation in general, i.e. Google Translate is a state of the art system.

I think it is inherent to English at least in degree.

Reading translated books from Polish to Spanish or Russian to Spanish conveys a lot more information, than reading the same book in their English translation.

It's like every subtle nuance is lost in English.

There could be multiple factors I think, like: your language skill level, translation quality, "closeness" of languages, psychological predisposition towards your native language.

Depending on the form of the thing you are translating it can be simply impossible to translate properly (like poetry).

I'm currently reading English translation of "Black Obelisk" which I believe is written in German and it isn't any worse than Russian translation (my native language) to me.

In any case, what was originally asserted is that English is somehow worse than other languages as a "transitional" translation language for words or simple phrases, so I argued with that specific idea. Translating literary works is a subject of it's own and where the quality is much harder to measure.

Looking at translations as if they represent languages seems like a common beginner trope. -- I certainly made that mistake.

Funny that the upstream commenter essentially praises Spanish as superior to English, Spanish being the language I dismissed as less expressive than English when I was a noob.

A bad Netflix or literary Spanish translation, for example, is full of frustrating ambiguity ('who is the antecedent of "su" here? "Bajó"? Who bajó?? At least in English we have to use "he" or "she"!'). And, with experience, you realize that native Spanish writers will keep things disambiguated, it's just crappy translation shortcuts that don't. And trying to compare translations is only an exercise in comparing translators.

Though you said it better than I did.