|
|
|
|
|
by JaRail
2412 days ago
|
|
I'm not really sure where you're getting this. It doesn't pick a specific voice from a database to use. From their introduction: "Our approach is to decouple speaker modeling from speech synthesis by independently training a speaker-discriminative embedding network that captures the space of speaker characteristics and training a high quality TTS model on a smaller dataset conditioned on the representation learned by the first network." Section 2 of the paper explains how it works. Two minute papers also goes through it if you'd prefer a video. Link: https://youtu.be/0sR1rU3gLzQ |
|