| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bananaface 2151 days ago

I don't feel this way at all. Videos have automatic subtitles which you can automatically translate into the language of your choice, speech recognition is so good that the right tool will let you program with it, text to speech is a button-click away for basically any Web page (all I need is an extension). Post-processing for color blindness is amazing, left-to-right languages render readably on a consistent basis. OCR is progressing dramatically and we're starting to see projects focused at individual users, and automatic image tagging gives textual descriptions of a huge amount of picture content.

We're at a point where a lot of these tools haven't matured in their consumer implementations, but that's coming. It's just a matter of time.

That's all ignoring the soft accessibility of things like iPads that have made computing accessible to Grandma.

2 comments

Tainnor 2151 days ago

Automatic subtitles for videos in a different language are basically a joke currently.

I agree that we're progressing fast, but fully automated machine translation is IMHO still lightyears away (if at all feasible). And to automate subtitle generation in a foreign language, you first need to have speech to text, which is also still error-prone, so now you have two sources of errors.

We're seeing the uncanny valley problem: By now, things like machine translation are so good for simple use cases, that they're being aggressively pushed, and at first it may even appear correct / as if it was done by a human, but then suddenly the translation becomes nonsensical and weird. Even for the well-received deepl, it's still surprisingly easy to give it some text that it really struggles with.

Incidentally, I remember attending a lecture about 12 years ago by the then new professor of NLP who was talking about his success with using machine aided human translation of subtitles from Swedish into Norwegian. Granted, a lot may have improved in 12 years, but it still struck me as impressive that even in languages that closely related, the best they could hope for in a research project was machine aided translation.

link

ptx 2151 days ago

Machine translation can never replace real translators, unless we develop an AI with actual understanding.

Even with human-translated texts it's usually noticeable when the translator didn't understand the subject. To make sense of the translated text you then have to try to reverse-engineer the translator's mapping to figure out what the text would have said in the original.

Much like how you can't properly parse HTML using only regular expressions and string substitution, you can't truly translate human languages without understanding. You have to parse the input language, process the meaning of what was said and finally serialize to the target language.

link

henrikschroder 2151 days ago

Subtitling adds even more issues that machine translation simply can't handle, because like a good book translation, it's an artform.

Making good subtitles means you prioritize readability over accuracy. You have a limited amount of space for your text, and you want to keep a low characters per second, so you cut words, ruthlessly. But you have to choose which words to cut so that it still makes sense, which means that you have to identify filler words so you can cut them, or figure out ways to re-phrase something into a shorter sentence.

You probably also want to preserve the tone and style of the dialogue, which means you have to choose the right synonyms, not just the most common ones.

And if you're creating hearing-impaired subtitles, it becomes even more necessary to understand what's going on in the video. If someone slams a door center-screen, you can cut that from the subtitles if you have more important things to display, but if someone slams a door off-screen, you absolutely have to include it in the subtitles, because that's the kind of information a hearing-impaired person needs.

Good luck training your little machine-learning network how to identify which sound effects originate from objects on-screen and which originate off-screen...

link

Tainnor 2151 days ago

I agree in the general sense. The problem is that good human translation works as follows: The translator reads the text, decodes this into some mental representation, and then encodes that representation in the target language. Both decoding and encoding are also highly subjective (which is why works of literature can be translated in many different ways, see e.g. all the translations of works like the Bible, the Odyssey, etc.).

Machine translation still works by a straightforward source-to-target mapping. This assumes that there is somehow a 1:1 correspondence between concepts in one language and concepts in the other one.

There are some cases where this can yield OK results: when the languages are very closely related and/or if the material is very technical (e.g. instruction manuals), because in such cases, the concepts do tend to align a bit better.

But in general, I think the problem is intractable without solving general AI.

link

nullc 2151 days ago

> left-to-right languages render readably on a consistent basis

‫os epoh dluohs I.

link