Hacker News new | ask | show | jobs
by ptx 2146 days ago
Machine translation can never replace real translators, unless we develop an AI with actual understanding.

Even with human-translated texts it's usually noticeable when the translator didn't understand the subject. To make sense of the translated text you then have to try to reverse-engineer the translator's mapping to figure out what the text would have said in the original.

Much like how you can't properly parse HTML using only regular expressions and string substitution, you can't truly translate human languages without understanding. You have to parse the input language, process the meaning of what was said and finally serialize to the target language.

2 comments

Subtitling adds even more issues that machine translation simply can't handle, because like a good book translation, it's an artform.

Making good subtitles means you prioritize readability over accuracy. You have a limited amount of space for your text, and you want to keep a low characters per second, so you cut words, ruthlessly. But you have to choose which words to cut so that it still makes sense, which means that you have to identify filler words so you can cut them, or figure out ways to re-phrase something into a shorter sentence.

You probably also want to preserve the tone and style of the dialogue, which means you have to choose the right synonyms, not just the most common ones.

And if you're creating hearing-impaired subtitles, it becomes even more necessary to understand what's going on in the video. If someone slams a door center-screen, you can cut that from the subtitles if you have more important things to display, but if someone slams a door off-screen, you absolutely have to include it in the subtitles, because that's the kind of information a hearing-impaired person needs.

Good luck training your little machine-learning network how to identify which sound effects originate from objects on-screen and which originate off-screen...

I agree in the general sense. The problem is that good human translation works as follows: The translator reads the text, decodes this into some mental representation, and then encodes that representation in the target language. Both decoding and encoding are also highly subjective (which is why works of literature can be translated in many different ways, see e.g. all the translations of works like the Bible, the Odyssey, etc.).

Machine translation still works by a straightforward source-to-target mapping. This assumes that there is somehow a 1:1 correspondence between concepts in one language and concepts in the other one.

There are some cases where this can yield OK results: when the languages are very closely related and/or if the material is very technical (e.g. instruction manuals), because in such cases, the concepts do tend to align a bit better.

But in general, I think the problem is intractable without solving general AI.