|
|
|
|
|
by archerx
513 days ago
|
|
I have been experimenting with piper TTS recently, it's free, open source, fast and has a lot of voices in different languages but the quality is not the best but it's still good enough for most cases. https://rhasspy.github.io/piper-samples/ |
|
Especially the rhythm and timing is often very jarring making words difficult to understand, especially when the pitch is not quite right.
It also doesn't seem to know about pacing, ignoring semicolon and comma.
Combined I often need to think hard about what it just said, or even listen to it again.
I also notice these issues in the various English voice models to varying degrees, so seems to be an inherent problem. Or can it be improved significantly with training it yourself?
[1]: https://rhasspy.github.io/piper-samples/