Hacker News new | ask | show | jobs
by andberx 59 days ago
This is really cool. Voice cloning + translation in one pipeline is something a lot of content creators would pay for right now. Especially for YouTube dubbing where you want to keep the original personality of the speaker.

Are you handling the speech-to-text, translation, and voice synthesis as separate steps or is it more of an end-to-end model? Curious how you deal with things like pacing and intonation that don't always carry over between languages.