|
|
|
|
|
by Tainnor
2150 days ago
|
|
Automatic subtitles for videos in a different language are basically a joke currently. I agree that we're progressing fast, but fully automated machine translation is IMHO still lightyears away (if at all feasible). And to automate subtitle generation in a foreign language, you first need to have speech to text, which is also still error-prone, so now you have two sources of errors. We're seeing the uncanny valley problem: By now, things like machine translation are so good for simple use cases, that they're being aggressively pushed, and at first it may even appear correct / as if it was done by a human, but then suddenly the translation becomes nonsensical and weird. Even for the well-received deepl, it's still surprisingly easy to give it some text that it really struggles with. Incidentally, I remember attending a lecture about 12 years ago by the then new professor of NLP who was talking about his success with using machine aided human translation of subtitles from Swedish into Norwegian. Granted, a lot may have improved in 12 years, but it still struck me as impressive that even in languages that closely related, the best they could hope for in a research project was machine aided translation. |
|
Even with human-translated texts it's usually noticeable when the translator didn't understand the subject. To make sense of the translated text you then have to try to reverse-engineer the translator's mapping to figure out what the text would have said in the original.
Much like how you can't properly parse HTML using only regular expressions and string substitution, you can't truly translate human languages without understanding. You have to parse the input language, process the meaning of what was said and finally serialize to the target language.