Hacker News new | ask | show | jobs
Text to Speech from voices with ~15 minutes of Audio from YouTube videos (youtube.com)
5 points by sentdex 2537 days ago
2 comments

Elon sounded like he was talking backward, but I think you're definitely on the right path. Really interesting.
Using transfer learning on top of a DCTTS model (deep convolutional text to speech), I wanted to see how quickly one could recreate voices remotely convincingly.

TLDR/W, using ~15 minutes of audio and about 1.5 hours of training, I was able to create what I think are pretty good examples of voices of myself, Donald Trump, Obama, Musk, and Joe Rogan.

None are perfect, and very much still a work in progress, but maybe something you might want to note that exists now (and has for years).

Even if you don't post videos of yourself on YouTube, your audio is almost certainly stored, tagged by your name, by Google (Assistant), Apple (Siri), Amazon (Alexa), and probably many other providers.