Hacker News new | ask | show | jobs
by danielbln 1052 days ago
Op also clearly hasn't used Elevenlabs or similar tools. If you clone a professional narrator it already sounds incredibly good and effectively indistinguishable from a human. Giving acting directions to the model to steer the output (kind of like ControlNet does for Stable Diffusion) seems like a logical next step.
1 comments

But in this case, they want to avoid the human input. So, I guess, it would rather work by reading and copying the intonation of the source voice.