Hacker News new | ask | show | jobs
by synesthesiam 1457 days ago
Hi all, author here. Besides the tech of Mimic 3 itself, I'm interested in training voices in as many (human) languages as possible. All it takes is one person willing to donate a dataset for everyone to benefit!

...well, that and a bunch of stuff with phonemes. But I'll do that part :)

5 comments

Can't you use the Mozilla Common Voice dataset for that?
The Mozilla Common Voice dataset is awesome - however it's useful the opposite purpose - speech-to-text. This is because it is a lot of different people using a range of hardware, speaking similar phrases.

For good text-to-speech you need 1 person speaking different phrases but very consistently. Here's an example dataset from Thorsten a German open voice enthusiast: https://openslr.org/95/

Thanks for the explanation!
What does it take to add Chinese and Japanese to this? Surely it's a lot more than just training sets right? I have an android phone without access to google tts, so this might actually potentially be a nice alternative.
How can people contribute? I'd be happy to sit in front of a microphone for awhile if I could use my own voice in a TTS engine!
They want you to make good quality audio recordings of you speaking about 20 000 phrases. It could take 40 to 80 hours of speaking and recording, maximum 4 hours per day.

https://github.com/MycroftAI/mimic-recording-studio

https://mycroft.ai/contribute/

The amount of data depends on if there's a voice for the language already. If so, about 2 hours of data is usually good enough. Otherwise, 10-20 hours usually does it.
Where could I donate my voice?
What kind of workload are we looking at, do you care for the Australian accent?
Bloody oath we do!
Translation: "Yes"

... Hi from Darwin :D