|
|
|
|
|
by mabcat
2108 days ago
|
|
> any resource recommendations or tips for writing a translation engine All the successful translation websites/apps you're familiar with use machine learning. ML stomped all over NLP approaches because it gives a rough translation between so many more languages for so much less work. On the question of where the data comes from, you might be a bit closer than you think. That dictionary you're transcribing has some sentence pairs, and like yorwba said, sentence pairs are food for ML language models. Extracting all the sentence pairs into a dataset might raise some interest from ML people. |
|
I continually see consumer-facing ml approaches (FB, Google) give terrible Vietnamese translations because they assume all of the context needed for a translation is available in the text. In general this is not the case. In Vietnamese this is hugely obvious because their pronoun system is largely based on 3rd person relationships ("sister walks down the street", "boyfriend loves girlfriend"), which is impossible to map to/from English 2nd person ("you walk down the street", "I love you") without basically a full conscious intelligence. Even FB, which is in a unique circumstance of actually having a lot of the requisite relationship data between people available to it, does a terrible job at this.
My (tiny) understanding of the incredibly rich kinship systems in indigenous Australian cultures suggests that this would be a huge issue there as well, assuming these complexities are also present in their languages. (...OP? :) )