Hacker News new | ask | show | jobs
by belevtsoff 2400 days ago
The audio is also generated. We used speech2speech voice conversion for this, so it is indeed more involving than TTS, for instance, but also more expressive and controllable. Here's another example: https://youtu.be/t5yw5cR79VA