|
|
|
|
|
by magicalhippo
513 days ago
|
|
For my native language, Norwegian, Piper TTS is at best "usable", and sometimes a fair bit worse than that. At least in its default form[1]. Especially the rhythm and timing is often very jarring making words difficult to understand, especially when the pitch is not quite right. It also doesn't seem to know about pacing, ignoring semicolon and comma. Combined I often need to think hard about what it just said, or even listen to it again. I also notice these issues in the various English voice models to varying degrees, so seems to be an inherent problem. Or can it be improved significantly with training it yourself? [1]: https://rhasspy.github.io/piper-samples/ |
|
I’m sure it’s possible to train new voices.
The English voices are hit or miss, but some voices have up to 900 speakers so it should be able to find a nice voice in the hay stack.
The thing I like about piper is it is so fast. I set it up to stream the output to VLC and it starts speaking in less than a second even on my laptop.
I wish it could have eleven labs quality but right now the speed is the most important factor for what I’m doing with it.