Hacker News new | ask | show | jobs
by Etheryte 1445 days ago
I'll believe it when I actually see it. I'm a native of a reasonably small language spoken by about a million people and never have I ever seen a good automatic translation for it. The only translations that are good are the ones that have been manually entered, and those that match the structure of the manually entered ones. I think the sentiment is laudable and wish godspeed to the people working on this, but for the time being I don't see it becoming a reality yet. When Google Translate regularly struggles even with big pairs such as German-English-German, I have reservations about someone making it work for languages where datasets are orders of magnitude smaller.
4 comments

It's an extremely difficult problem indeed. A lot of people on the team speak low-resource languages too (my native language as well!), so definitely resonate with what you're saying. My overall feeling is: yeah it's hard, and after decades we can't even do German translation perfectly. But if we don't work on it, it's not gonna happen. I really hope that people who are excited about technology for more languages can use what we've open sourced.
> But if we don't work on it, it's not gonna happen.

That’s exactly right. There’s too much bias in society that if something isn’t perfect, then why bother? Nothing is perfect, so with that attitude there can be no progress. Thank you for doing important work!

Personally I'm hoping that globalisation prunes out as many languages as possible before we end up with brain implants automatically translating everything for us and no one can communicate without these chips.
That's a silly thing to wish for: like wishing global warming kills off as many species of animals as possible, in order to simplify zoology.
Becoming bilingual is one thing. Completely extinguishing a language is a totally different matter. It is usually associated with migrating away from the geographic area of the language and/or physically losing speakers (old age, wars, genocides, etc.)

You can check the list at https://en.wikipedia.org/wiki/List_of_languages_by_time_of_e...

To make an example and be blunt: I do not expect any European country official language to get extinct anytime during our lifespan unless that country gets destroyed, which obviously won't be a good thing.

As for brain implants, I won't hold my breath.

There are several reasons a language can go extinct, and while the reasons you mentioned are included there are several other reasons that you didn't consider. The most common way language death occurs is through contact with a prestige language, which results in social pressure that makes the non-prestige language become less and less commonly spoken[1]. A community becoming bilingual is part of this process -- it doesn't always result in language death, but it is an early stage of that form of language death.

As for a European country official language (strange line to draw), Icelandic is already in a state where younger Icelandic natives speak to each other in English because smartphones do not support their native language. It is entirely possible that Icelandic will be in danger of extinction this century[2].

[1]: https://www.youtube.com/watch?v=t3qbYFvOHwk [2]: https://www.nytimes.com/2017/04/22/world/europe/iceland-icel...

Icelandic won't disappear soon. (it's a bit of a mistake to say latin disappeared but I digress)

It might take some work (and more media being produced in it, surely) but I don't think it's at risk

> because smartphones do not support their native language

Nowhere in the article says that. And most smartphones do even have support (including the Icelandic keyboard)

what is not supported is voice activated devices:

> "“Not being able to speak Icelandic to voice-activated fridges, interactive robots and similar devices would be yet another lost field,"

So, Ukrainian?
> prunes out as many languages as possible

Would you feel the same way if that includes all languages you know, including all versions of English?

Yes, I’d be happy to learn a new language if it was the step to one unified language. Of course there is no way to back that up since it will never happen but I at least assume I’d do it if that was the case.
Would you not feel even slightly unhappy that your children (or grandchildren) would not be able to read the original versions of Shakespeare, Dickens, or Austen or that they wouldn't be able to watch the original versions of movies and shows you've enjoyed? They would only be able to watch and read translations, and all of the linguistic artistry would be lost to them.

It's not just about becoming bilingual, a population becoming bilingual in a "prestige language" is the first stage of language death (though of course it doesn't /always/ lead to language death).

How would you feel about being the one that has to teach your children a language that will hinder their prospects instead of one that will help them succeed, just so that the speakers of the bigger language feel good about themselves that they are good people or something?
I could not think of a more obvious way of telling everyone that you are monolingual than implying that knowing another language is a burden. Children are not burdened by being multilingual.

It is true that social pressures kill off local languages, but it's usually not because parents don't want to teach their children their mother tongue, it's that people stop using the language to communicate because of the influence of the "prestige language". My parents (and all of the parents in the immigrant community I live in) went through great pains to teach their children their native language.

I don’t think I my prospects in any way is hindered by being a native speaker of a fairly small language. If anything, I prefer having English as a second language.

How would you feel if your kids only learned Chinese[1], and not a word of English?

[1] I’m assuming you’re not Chinese

I'm still a bit disappointed that Esperanto isn't the official language of the EU.
I speak a medium-resource language with 11 million speakers. Google Translate works so poorly with it that translations are often nonsensical. But DeepL works so well with it that translations are often indistinguishable from native speaking translations. I'm a big believer that the model can make a huge difference.
On the other hand, as a non-native Japanese learner, it is very obvious when Japanese text has been DeepL-translated because it often makes 敬語/register and context mistakes (and translating Japanese to English it does even worse because it struggles with null-subject languages). I am sure a native Japanese speaker would be able to see even more mistakes than I can.
DeepL seems to handle grammar a bit better (ex. run-on sentences) but for whatever reason, it struggles with basic vocabulary sometimes. Also, when it does make mistakes, they change the meaning subtly enough to render the translation unusable.
Google Translate does the same in many languages, to the point that it will often reverse the meaning of a sentence. I honestly feel like these tools are still mostly useful when you don't really need to know what the text means.
> I ever seen a good automatic translation for it. > > When Google Translate regularly struggles even with big pairs such as German-English-German, I have reservations about someone making it work for languages where datasets are orders of magnitude smaller.

I speak a language where I've never seen any translation for it... and when translated manually, my mum totally butchers the meaning lol.

Either way, any work in this area is more than welcome, but damn it's a hard problem.

There's a section where you can try reading translated children's books. See if your language is supported and how good the translation is.
Burmese and Cambodian are 100% useless on google/bing translate, but the children books translations on the example page are really, really good.
Surprisingly, translations of the books into Russian seem considerably better than into English (at least for the first three books I tried)
There's a large tradition of having texts translated into Russian, whereas English speakers would very rarely read anything translated from another language.
"Tradition" sounds a bit funny knowing that a lot of Russian book publishing was/is published without author permission. Russian publishing isn't well known for following copyright laws :V
I'm pretty sure that Russian (Soviet?) publishers observed whatever copyright laws there were at the time. It's just that international copyright law is a recent phenomenon. USA is also a very late signatory of international book copyright treaties AFAIK.