|
|
|
|
|
by belevtsoff
2400 days ago
|
|
The audio is also generated. We used speech2speech voice conversion for this, so it is indeed more involving than TTS, for instance, but also more expressive and controllable. Here's another example: https://youtu.be/t5yw5cR79VA |
|