|
|
|
|
|
by yorwba
1679 days ago
|
|
A big problem with this kind of massively multilingual machine learning research is that the researchers in question know almost nothing about most of the languages they're dealing with. They also grouped Malayalam with Malay. (Though they also say that they focused on languages that get the most translation requests, so maybe this is down to users getting confused about which language they want.) Their parallel sentence mining project LASER also has problems that are obvious when you know the languages involved. Some time ago I looked at their most confident matches for English-Chinese and briefly thought I was looking at the least confident ones, because Bible quotes were paired with random snippets in Classical Chinese. I think their embedding model was confused by the archaic language. So I'm glad they also used human evaluators and not just BLEU scores, but I'd've really liked to see a human evaluation of their training data. I think it's possible that the model can average out noise to produce better garbage when you put garbage in, but it might also get completely confused and produce worse garbage. With their testing setup, it's impossible to tell whether more data or better data is needed to improve the performance of this model. |
|
There is also no way for a reader of the paper to judge the effectiveness of the algorithm. They cite this evaluation of "semantic accuracy", but nothing about the design of the task, participant selection, example data.
This paper is pretty much junk science. Even the reference section is amateurishly formatted