Hacker News new | ask | show | jobs
by femto 4131 days ago
The article has this to say about Google Translate's improvements in accuracy:

"Translate has hoovered up gigantic quantities of parallel texts into its database. A particularly fertile source of these useful things, apparently, is the European Union’s set of official publications, which are translated into all Community languages."

The first thing I thought was "what happens when the EU starts outsourcing its translations to Google Translate?"

Is this the future of machine learning? The learning algorithms start by mining a corpus of human output. Once they get good enough, they replace the majority of humans that generated the corpus. We then enter an echo chamber of machines largely feeding off their own output. Consequently, improvement of the machines stagnates, but the machines are still doing a good enough job to keep humans out of a job. We then have a future of "good enough that the cost of improvements can't be justified, but bad enough to be irritating"?

Humans have a sense of pride in their work, and will strive to improve for their own edification, even when the cost outweighs the benefit, or they have been told not to. A machine will just continue to deliver the level of service that the committee in charge tells it to.

Edit: fixed spelling

5 comments

> Humans have a sense of pride in their work, and will strive to improve...

Including the humans who make machine translation algorithms.

My point is that those new algorithms need something to learn from, and the humans that used to do the job are no longer in the game. The original corpus could be reused, but then performance will be bounded by that corpus. If better algorithms are trained on the output of worse algorithms, presumably they just emulate the performance of the worse algorithm. Where do the better algorithms get their input from, if a large scale human effort no longer exists?
But why would every translator stop working or creating new works just because machines can do the job too? I don't think computer written novels will mean people stop telling stories.

Translation is an interpretation of the best phrase to use, and has a subjective element. Imagine trying to translate jokes - it depends on your sense of humor too.

Automation rarely replaces 100% of human workers. What tends to happen is that it replaces 99% of humans, does work that's almost identical to human work, and the 1% of humans left fix the machine work so it is identical to what 100% human workers would have done.

It's already happening in translation. Instead of a human translator translating an entire text. They simply first feed the text through translation software. That does a pretty decent job, but still makes mistakes. However, the mistakes are obvious to the human translator. Who now just fixes the translation. Most of the work is done by the machine. And a firm with 10 human translators can get rid of 9 of them, and still be as productive as it was it was with all 10 people.

This pattern of automation replacing almost all, but not exactly all, human workers is seen in many industries.

Machine learning researchers aren't going to stop looking for ways to develop better translation algorithms, learned more effectively from the vast volume of data that already exists.
While I don't think diplomats and legislative drafters are going to be put out of a job by Google any time soon, the problem is that if research seems to be yielding diminishing returns management will often cut off funding, even though the algorithm might just be stuck on a local maximum.
Maybe we will be content to have "good enough," but I'd think that there is enough value in good quality translation that Google and others will pay people to "train" their algorithms and data set. Instead of waiting till someone tediously translates things, and then trying to learn from it, it would have skilled translators skim over the output, and correct it where it deviates from perfection, while providing concise feedback as to how it got things wrong, how bad it is, why it is wrong, etc. Which is one example of the type of skilled job that I see opening up in the future....the training (not "programming" per se) of robots.

The same thing can happen for, say, training a self driving car how to merge. A good driver can let the car attempt it itself, but slap it into manual when it is failing, then provide some sort of additional feedback -- not so different than a driving instructor teaching a kid to drive.

As robots do more and more things, there will be more and more opportunities for people to train them.

The only scenario where the economics of that doesn't work out is if everyone is employed so they can't afford to pay the trainers enough....but the rest of the article doesn't support that.

You're assuming there is a single perfect translation, or even a single perfect translation algorithm across all domains. I'd wager that translating legalese and translating prose will need completely different algorithms trained on them and in the latter case you won't find two translators that fully agree on the best way to translate a given text.
I know translators who use Google Translate for a first pass and then correct the results. Translators time is limited so anything that allows them to focus more on the nuances of language is a good thing.