Hacker News new | ask | show | jobs
by hprotagonist 3183 days ago
in my experiences as a user of and developer for speech recognition systems, I have concluded that the best any machine translation system is going to be able to do is translate really basic, imperative communications.

e.g, "Don't eat that!" "We are friendly, don't shoot" "There is food in the kitchen" "Where is the nearest bathroom?"...That will all work relatively well.

Punishing others with Vogon poetry in their own native tongue... never gonna happen.

1 comments

Google's remarkably good at translating anything that reads like news. Because there are so many news articles written about the same topic in different languages, that data is great. Google is not so good at translating things like love letters, because it's hard to get someone to write one in two languages and publish both.
certainly the relative paucity of some subject matter to train on is a factor.

The other factor is that complex use of natural languages depend on and require ambiguity and layered meaning and hosts of other factors that are really hard to handle.

I'm not sure love letters are more complex than the news. It often turns out we're not as complex as we think. Or that doing simple calculations on very large amounts of data captures all those nuances. That was the result described in "The Unreasonable Effectiveness of Data".
there's (far) more information in-band in a snippet of language than there is in the "plain" "meaning" of that snippet. All of it's relevant for translation, and none of that is easily tractable.

A fairly trivial example is a pun. Translating highly idiomatic things of this nature turns out to be extraordinarily hard, and just throwing more data at a DNN is not going to get you too far down that road.

That depends on how often that pun appears in the corpus. If you observe the translation enough times, it'll be easy for the computer.
this misses the point. There do exist untranslatable idioms, and puns are an easy example. They rely on language-specific features like rhymes (or sight-rhyme) or homophones that are not preserved across corpora.

You can absolutely make a "direct" word-for-word translation of a pun in english to, say, russian. It's just not a pun any more when you're done. Often there are no "pun equivalents" with totally different words, because usually they hinge on culturally specific references that also don't translate well.

Basically none of this matters when what you're interested in is subway directions or ordering food or whatever, but it becomes intractable really fast whenever you're interested in talking about something more meaningful.

Love letters rely more on emotion, allegory, and subtext than the news, especially modern-era news.
Does that make translation more random? If not, then the real trouble is lack of data.