|
|
|
|
|
by laserduck
493 days ago
|
|
I wonder why they grouped languages from the Middle East and South Asia together. Arabic and Hebrew are Semitic languages - no language from that family tree is native to the subcontinent. It would make sense if northern languages like Hindi, Urdu, Bengali, Nepali, etc were grouped with Persian, French, Russian, etc since those are all from the Indo-European family. South Indian languages like Telugu and Tamil are from a completely different family (Dravidian). Why not either train the model exclusively on Semitic languages for further performance for those languages or on a wider set of languages for better multilingual performance overall? I don't understand the logic here. |
|
So properly speaking, they should be advertising the target region as Europe, Middle East and Africa. [3]
[1] https://en.wikipedia.org/wiki/Languages_of_Germany [2] https://en.wikipedia.org/wiki/Languages_of_Afghanistan [3] https://en.wikipedia.org/wiki/List_of_countries_and_territor...