Hacker News new | ask | show | jobs
by yorwba 926 days ago
> maybe we could provide more data.

If you want to contribute by recording yourself speaking Swahili, https://commonvoice.mozilla.org/sw is the place to go. Although Meta has access to much larger data sets, they nonetheless use Common Voice as a "known good" source. E.g. the paper on their SONAR speech encoder reports experiments on Common Voice data, coincidentally involving Swahili https://ai.meta.com/research/publications/sonar-sentence-lev...