Hacker News new | ask | show | jobs
by gagabity 924 days ago
I had pretty terrible results when I tried English -> Swahili I'm using the Huggingface M4T V2 spaces, it pretty much doesn't work most of the time and I just get English back with a different voice, Expressive on the other hand only has a few languages it seems.

It would be nice if they could layout what exactly is missing in terms of data to make a language work better, while the actual AI bit is out of reach for most of us maybe we could provide more data.

There is also a 60 sec limit and wonder if this is HuggingFace limitation or Seamless?

1 comments

> maybe we could provide more data.

If you want to contribute by recording yourself speaking Swahili, https://commonvoice.mozilla.org/sw is the place to go. Although Meta has access to much larger data sets, they nonetheless use Common Voice as a "known good" source. E.g. the paper on their SONAR speech encoder reports experiments on Common Voice data, coincidentally involving Swahili https://ai.meta.com/research/publications/sonar-sentence-lev...