Hacker News new | ask | show | jobs
by hello_im_angela 1447 days ago
sooo real. Many low-resource languages have many different natural variants, can be written in multiple scripts, don't have as much written standardization, or are mainly oral. As part of the creation of our benchmark, FLORES-200, we tried to support languages in multiple scripts (if they are naturally written like that) and explored translating regional variants (such as Moroccan Arabic, not just Arabic).

As an aside, the question of how to think about language standardization is really complex. We wrote some thoughts in Appendix A of our paper: https://research.facebook.com/publications/no-language-left-...