|
|
|
|
|
by nielsinho
1231 days ago
|
|
It uses the TorToiSe TTS model for generation. It's simple to generate conditioning voice latents using short audio samples. Likely transcribed JRE episodes were part of the TorToiSe training data, explaining how it's so good at recreating his voice characteristics in particular. |
|