| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kgeist 1445 days ago

I wonder how it differs from what Yandex.Translate did back in 2016: [0]

>The affinity of languages allows one common model to be trained for their translation. That is, “under the hood” of the translator, the same neural network translates into Russian from Yakut, Tatar, Chuvash and other Turkic languages. This approach is called many-to-one, that is, "from many languages \u200b\u200binto one." This is a more versatile tool than the classic bilingual neural network. And most importantly, it is the many-to-one approach that makes it possible to use knowledge about the structure and vocabulary of the Turkic languages, learned on the rich material of Turkish or Tatar, to translate languages like Chuvash or Yakut, which are less “resource-rich”, but no less important for the cultural diversity of the planet.

>In order to create a unified model for translating Turkic languages, Yandex developed a synthetic common script. Any Turkic language is translated into it, so that, for example, the Tatar “dүrt” (“four”) written in Cyrillic becomes similar to the Turkish dört (“four”), not only from the point of view of a person, but also at the level of similarity of lines for a computer.

This way they added support for Turkic and Uralic languages which are very underrepresented on the Internet. But I don't know what the quality of their translation is: even though I live in a region where Mari is spoken (indigenous Uralic language) and my wife is Mari, none of us, sadly, speak the language.

[0] https://techno-yandex-ru.translate.goog/machine-translation/...

1 comments

hello_im_angela 1445 days ago

We represent all languages in their natural script, rather than transliterating them into a common synthetic one.

Regarding Mari: extremely interesting language, exciting to hear that you are from that region. We are interested in working on this one (likely in the "Hill Mari" variant), but currently do not support it.

link