Hacker News new | ask | show | jobs
by patrickxb 3182 days ago
Is this a Babel fish?
7 comments

in my experiences as a user of and developer for speech recognition systems, I have concluded that the best any machine translation system is going to be able to do is translate really basic, imperative communications.

e.g, "Don't eat that!" "We are friendly, don't shoot" "There is food in the kitchen" "Where is the nearest bathroom?"...That will all work relatively well.

Punishing others with Vogon poetry in their own native tongue... never gonna happen.

Google's remarkably good at translating anything that reads like news. Because there are so many news articles written about the same topic in different languages, that data is great. Google is not so good at translating things like love letters, because it's hard to get someone to write one in two languages and publish both.
certainly the relative paucity of some subject matter to train on is a factor.

The other factor is that complex use of natural languages depend on and require ambiguity and layered meaning and hosts of other factors that are really hard to handle.

I'm not sure love letters are more complex than the news. It often turns out we're not as complex as we think. Or that doing simple calculations on very large amounts of data captures all those nuances. That was the result described in "The Unreasonable Effectiveness of Data".
there's (far) more information in-band in a snippet of language than there is in the "plain" "meaning" of that snippet. All of it's relevant for translation, and none of that is easily tractable.

A fairly trivial example is a pun. Translating highly idiomatic things of this nature turns out to be extraordinarily hard, and just throwing more data at a DNN is not going to get you too far down that road.

That depends on how often that pun appears in the corpus. If you observe the translation enough times, it'll be easy for the computer.
Love letters rely more on emotion, allegory, and subtext than the news, especially modern-era news.
Does that make translation more random? If not, then the real trouble is lack of data.
I wonder if there will be references in the packaging to this. I mean, the original Chromecast referenced HG2G.
It is such a bizarrely improbable coincidence that something so mind-bogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as a final and clinching proof of the non-existence of God.
Too bad that name was already taken. Did Douglas Adams (or his estate) license it out to AltaVista in the first place? Wait, no, it looks like the name is biblical.
I think you mean Douglas Adams.
Yes, well that's embarrassing. Fixed.
At least we have Douglas Adams to thank for the fact they can't patent this!
Seems like.
a google-fish