|
|
|
|
|
by youka
2298 days ago
|
|
You are close enough, so I have to respect my word. I’m not a genius, just a lego builder, I’ve tried a lot of methods, from DL to ML but aeneas project (with some optimizations) gave me the best results. Amazing project and even better personality. Take a look at https://github.com/readbeyond/aeneas
Together with espeak-ng, you can get good results for line level alignment for 108 languages. |
|
Espeak-ng supporting 108 languages is maybe a bit misleading. They have pronunciation definitions for many languages, but the actual level of support varies widely.
For Mandarin, espeak-ng 1.49.2 has a bug where it reads the tone numbers out loud instead of modifying the pitch contour, so e.g. the number 四 (four) is pronounced si si instead of sì, because it has the fourth tone. That's the version packaged for Ubuntu, so you may be using it for your API.
For Japanese, kanji aren't supported at all, so 四 is pronounced as "Chinese letter" (in English). For proper Japanese support, you'd need to switch to a different TTS engine like Open JTalk or preprocess the text to transform it into kana.
Also note that Aeneas is licensed under AGPL, which requires you to offer the source code if you let others interact with the program over a network (which is what your API does). So your attempt to keep the secret sauce private and only reveal it once someone guessed the algorithm was likely illegal. You should add proper copyright notices to your program and audioai.online