Hacker News new | ask | show | jobs
by firethief 2540 days ago
> I've found most of the best resources for actual text in smaller languages comes from religious translations, which can be a bit outdated but usually offer a fair amount of text. And beyond that I've gotten lucky with a few google drives and other repos mostly containing educational packets put together by various volunteer and missionary groups.

This weird source bias affects ML models; the results can get portentous:

https://www.vice.com/en_us/article/j5npeg/why-is-google-tran...