Hacker News new | ask | show | jobs
by jchw 2814 days ago
You know, though, machine translators have long been able to make subjective choices in translations. We deem them correct because a human can verify that the translation carries roughly the same intent, meaning, tone, etc. Not because it matches exactly what a human says.

Secondly, you are conflating concepts in my opinion. Localizing a movie may involve translators translating lines, but it also involves the creative work of localizing the title and other things, as you mentioned. A machine translator by today's definition translates a string of text in one language to a string of text in another. We needn't consider every type of work a human translator might do; it would be quite enough of a difference to close the gap on translating strings straightforwardly.

1 comments

This presumes you can translate all strings straightforwardly. You can't. There are times where I've been given a string and had to have an in-depth 30 minute discussion to understand enough of the surrounding context to be able to spit out a result. In certain cases no mapping exists.

Also, anyone who is able to verify that a translation conveys a meaning in enough of the same direction as the original utterance by definition doesn't need a translation as they know both the source and target language.

It's everybody else who is not able to verify for whom the accuracy matters for they have no recourse but to trust it.They are frequently led astray.

A couple of examples to illustrate.

掘り炬燵 (horigotatsu) is a noun referring to "low, covered table placed over a hole in the floor of a Japanese-style room"

Now, given this is something that doesn't exist in any Western, English speaking country it simply doesn't have a mapping in English. The best that can be done is to give an explanation of what it is.

Google translate "translates" it as "digging". Welcome to the last mile. In this case Google should just spit out an explanation of what it is. Digging is entirely incorrect and unhelpful.

But it gets worse. Imagine if it's used in a sentence. Here is a good example of a last mile issue in translation. It's impossible for you to translate it directly, so you have to fall back to a best effort attempt and either simplify and lose some information or stop mid-sentence and give an explanation of what the thing actually is.

掘り炬燵に座ってご飯を食べてた。

This sentence is all kinds of problematic from a translation point of view.

Google translates it as: "I sat on a digging stone and ate rice."

That borders on D+/C- in terms of quality for me. But there are a few good reasons as to why.

The original Japanese doesn't give the context of who is performing the action because that's simply not necessary to say in Japanese it's almost always just inferred from context in the moment and that gets lost when you only have a string. Thus it's possible this could be a "he, she, it, we, I, they". If the machine is forced to pick one option then it will pick one option.

Then there is the horigotatsu part which gets "translated" as "digging stone". What the hell is a digging stone? It ought to just say horigotatsu* and have a footnote. Machine translation today doesn't do footnotes. I wish it did.

Again there is a lack of context as to the meaning of ご飯 (gohan) which technically can mean cooked white rice but in this case most likely refers to a "meal". Though which meal is not specified and it could be breakfast, lunch or dinner but I'm going to guess it's dinner.

But what should the translation actually be? Is it even fundamentally "translateable"?

One valid translation would be "we sat in the horigotatsu and had dinner". That still requires an explanation.

Anyway, I hope it's a little clearer what I mean that it's not actually always possible to translate things.

I think we can hit parity with humans one day, but it requires fundamentally rethinking certain things at a UX level. For instance if instead of just an input form Google translate was more like a chatbot that could probe for more context when needed that's more my idea of where things need to ultimately wind up. Perhaps a model like rap genius where annotations contain extra details around possible alternatives and why the current word was chosen.... This is my 2 cents on the issue.

No, I am not presuming every sentence has a straightforward translation, just suggesting that a meaningful measure for the "last mile" of machine translation would be reaching human parity at that specifically.

Being able to provide additional context would be great, but I don't see why it would have to be done in a "human" way to satisfy the constraints.